Reducing Non-IID Effects in Federated Autonomous Driving with Contrastive Divergence Loss (Part 3)
In the previous article, we delved into integrating the contrastive divergence loss function into federated learning, exploring its potential benefits for enhancing model performance and tackling non-IID data issues in autonomous driving contexts. In this follow-up piece, we delve into various federated learning configurations and present experimental findings that support the efficacy of our approach.
1. Implementation
Dataset: Our experimentation involves three datasets (see Table 1): Udacity+, Gazebo Indoor, and Carla Outdoor. Gazebo and Carla datasets exhibit non-IID characteristics, while Udacity+ represents a non-IID variant of the Udacity dataset.
Dataset | Total samples | Average samples in each silo (Gaia) | Average samples in each silo (NWS) | Average samples in each silo (Exodus) |
---|---|---|---|---|
Udacity+ | 38,586 | 3,508 | 1,754 | 488 |
Gazebo | 66,806 | 6,073 | 3,037 | 846 |
Carla | 73,235 | 6,658 | 3,329 | 927 |
Table 1: The Statistic of Datasets in Our Experiments.
Network Topology: Our experimentation encompasses three distinct federated topologies, namely the Internet Topology Zoo (Gaia), North American data centers (NWS), and the Zoo Exodus network (Exodus). The primary focus is on the Gaia topology, while supplementary insights from the NWS and Exodus topologies are provided in our ablation study.
Training: Within each silo, model training is executed with a batch size of 32 and a learning rate set at 0.001, facilitated by the Adam optimizer. The local training regimen within each silo precedes the transmission and aggregation of models using the specified global aggregation equation. The training regimen spans 3,600 communication rounds and leverages a simulation environment akin to that described in Nguyen et al. (2022), powered by an NVIDIA 1080 GPU.
Baselines: Our comparative analysis involves several contemporary methods across diverse learning scenarios, including Random and Constant baselines as outlined by Loquercio et al. (2018). Within the Centralized Local Learning (CLL) scenario, we utilize Inception-V3, MobileNet-V2, VGG-16, and Dronet as baseline models. In the context of Server-based Federated Learning (SFL), our comparison extends to FedAvg, FedProx, and STAR. For the Decentralized Federated Learning (DFL) setting, our evaluation includes MATCHA, MBST, and FADNet. Model effectiveness is assessed using Root Mean Square Error (RMSE) and Mean Absolute Error (MAE), while wall-clock time (ms) serves as a metric for training duration.
2. Qualitative Results
In practice, we've noticed that the initial phases of federated learning often yield subpar accumulated models. Unlike other approaches that tackle the non-IID issue by refining the accumulation step whenever silos transmit their models, we directly mitigate the impact of divergence factors during the local learning phase of each silo. Our method aims to minimize the discrepancy between the distribution of accumulated weights from neighboring silos in the backbone network (representing divergence factors) and the weights specific to silo in the sub-network (comprising locally learned knowledge). Once the distribution between silos achieves an acceptable level of synchronization, we reduce the influence of the sub-network and prioritize the steering angle prediction task. Inspired by the contrastive loss of the original Siamese Network, our proposed Contrastive Divergence Loss is formulated as follows:
Model | Main Focus | Learning Method | RMSE (Udacity+) | RMSE (Gazebo) | RMSE (Carla) | MAE (Udacity+) | MAE (Gazebo) | MAE (Carla) | # Training Parameters | Avg. Cycle Time (ms) |
---|---|---|---|---|---|---|---|---|---|---|
Random | _ | _ | 0.358 | 0.117 | 0.464 | 0.265 | 0.087 | 0.361 | _ | _ |
Constant | Statistical | _ | 0.311 | 0.092 | 0.348 | 0.209 | 0.067 | 0.232 | _ | _ |
Inception | Architecture Design | CLL | 0.209 | 0.085 | 0.297 | 0.197 | 0.062 | 0.207 | 21,787,617 | _ |
MobileNet | Architecture Design | CLL | 0.193 | 0.083 | 0.286 | 0.176 | 0.057 | 0.200 | 2,225,153 | _ |
VGG-16 | Architecture Design | CLL | 0.190 | 0.083 | 0.316 | 0.161 | 0.050 | 0.184 | 7,501,587 | _ |
DroNet | Architecture Design | CLL | 0.183 | 0.082 | 0.333 | 0.150 | 0.053 | 0.218 | 314,657 | _ |
FedAvg | Aggregation Optimization | SFL | 0.212 | 0.094 | 0.269 | 0.185 | 0.064 | 0.222 | 314,657 | 152.4 |
FedProx | Aggregation Optimization | SFL | 0.152 | 0.077 | 0.226 | 0.118 | 0.063 | 0.151 | 314,657 | 111.5 |
STAR | Aggregation Optimization | SFL | 0.179 | 0.062 | 0.208 | 0.149 | 0.053 | 0.155 | 314,657 | 299.9 |
MATCHA | Topology Design | DFL | 0.182 | 0.069 | 0.208 | 0.148 | 0.058 | 0.215 | 314,657 | 171.3 |
MBST | Topology Design | DFL | 0.183 | 0.072 | 0.214 | 0.149 | 0.058 | 0.206 | 314,657 | 82.1 |
FADNet | Topology Design | DFL | 0.162 | 0.069 | 0.203 | 0.134 | 0.055 | 0.197 | 317,729 | 62.6 |
CDL (ours) | Loss Optimization | CLL | 0.169 | 0.074 | 0.266 | 0.149 | 0.053 | 0.172 | 629,314 | _ |
CDL (ours) | Loss Optimization | SFL | 0.150 | 0.060 | 0.208 | 0.104 | 0.052 | 0.150 | 629,314 | 102.2 |
CDL (ours) | Loss Optimization | DFL | 0.141 | 0.062 | 0.183 | 0.083 | 0.052 | 0.147 | 629,314 | 72.7 |
Table 2: Performance comparison between different methods. The Gaia topology is used.
Table above summarizes the performance comparison between our proposed method and recent state-of-the-art approaches. The results indicate that our CDL under the Siamese setup with two ResNet-8 models outperforms other methods by a significant margin. Notably, our approach achieves substantial reductions in both RMSE and MAE across all three datasets: Udacity+, Carla, and Gazebo. Despite not increasing the network's parameter count, CDL introduces a larger model size during training due to the additional sub-network required by the Siamese setup. Additionally, our CDL with ResNet-8 demonstrates superior performance compared to other baselines, particularly in the DFL learning scenario, and to a lesser extent in SFL and CLL setups.
3. Contrastive Divergence Loss Analysis
CDL Performance Across Various Topologies In practice, training federated algorithms becomes more complex as the topology involves more vehicle data silos. To assess the efficacy of our CDL, we conduct training experiments and compare the results with other baseline methods across different topologies. Table 3 presents the performance comparison of DroNet, FADNet, and our CDL with a ResNet-8 backbone when trained using DFL across three distributed network infrastructures with varying numbers of silos: Gaia (11 silos), NWS (22 silos), and Exodus (79 silos). The table clearly demonstrates that our CDL consistently achieves superior results across all topology configurations. In contrast, DroNet encounters divergence issues, and FADNet exhibits suboptimal performance, particularly in the Exodus topology with 79 silos.
Topology | Architecture | Udacity+ | Gazebo | Carla |
---|---|---|---|---|
Gaia (11 silos) | DroNet | 0.177 (↓0.036) | 0.073 (↓0.011) | 0.244 (↓0.061) |
FADNet | 0.162 (↓0.021) | 0.069 (↓0.007) | 0.203 (↓0.020) | |
CDL (ours) | 0.141 | 0.062 | 0.183 | |
NWS (22 silos) | DroNet | 0.183 (↓0.045) | 0.075 (↓0.017) | 0.239 (↓0.057) |
FADNet | 0.165 (↓0.027) | 0.070 (↓0.012) | 0.200 (↓0.018) | |
CDL (ours) | 0.138 | 0.058 | 0.182 | |
Exodus (79 silos) | DroNet | 0.448 (↓0.310) | 0.208 (↓0.147) | 0.556 (↓0.380) |
FADNet | 0.179 (↓0.041) | 0.081 (↓0.020) | 0.238 (↓0.062) | |
CDL (ours) | 0.138 | 0.061 | 0.176 |
Table 3: Performance under different topologies.
CDL with Various Architectures Our CDL, functioning as a loss function, exhibits versatility across different network architectures when integrated into the Siamese setup, leading to performance enhancements. Figure.1 showcases the efficacy of CDL across diverse networks such as DroNet, FADNet, Inception, MobileNet, and VGG-16 within the Gaia Network framework in the DFL scenario. The outcomes demonstrate CDL's efficacy in mitigating the non-IID challenge across varied architectures, consistently elevating performance.
CDL with IID Data Figure.2 illustrates CDL's efficacy across various data distributions. While CDL is primarily tailored for addressing the non-IID challenge, it also demonstrates marginal performance enhancements when applied to models trained on IID data distributions. Leveraging the Siamese setup, CDL inherits traits and behaviors akin to triplet loss. Given triplet loss's established effectiveness with IID data, it becomes evident that CDL can similarly augment model performance in scenarios where IID data is utilized.
3. Ablation Study
Figure.3 illustrates the training results in RMSE of our two baselines, DroNet and FADNet, as well as our proposed CDL. The results showcase the convergence ability of the mentioned methods across three datasets (Udacity+, Gazebo, Carla) with NWS and Gaia topology. It is evident that our proposed CDL can attain a superior convergence point compared to the two baselines. While other methods (DroNet and FADNet) struggled to converge or exhibited poor convergence trends, our proposed CDL demonstrates better overcoming of local optimal points and shows less bias towards any specific silo.
4. Discussion
Although our method exhibits promising results, there are several areas for potential improvement that warrant consideration for future work:
Our CDL is specifically designed to address the non-IID problem, meaning its effectiveness heavily depends on the presence of non-IID characteristics within the autonomous driving data. Thus, in scenarios where data distributions across vehicles are relatively consistent or lack significant non-IID factors, the proposed contrastive divergence loss may only lead to limited performance enhancements.
While our proposal has been validated on autonomous driving datasets, real-world testing on actual vehicles has not yet been conducted. Conducting a study involving various driving scenarios, such as interactions with pedestrians, cyclists, and other vehicles, could provide further validation of our method's efficacy.
Although our approach is tailored for autonomous driving applications, its underlying principles may have applicability in other domains where non-IID data is prevalent. Exploring its effectiveness in areas like healthcare, IoT, and industrial settings could expand its potential impact.
Conclusion
We presented a new method to address the non-IID problem in federated autonomous driving using contrastive divergence loss. Our method directly reduces the effect of divergence factors in the learning process of each silo. The experiments on three benchmarking datasets demonstrate that our proposed method performs substantially better than current state-of-the-art approaches. In the future, we plan to test our strategy with more data silos and deploy the trained model using an autonomous vehicle on roads.