


Surpassing SOTA by 3.27%, Shanghai Jiao Tong University and others proposed a new method of adaptive local aggregation
This article introduces a paper included in AAAI 2023. The paper was written by Hua Yang and Louis Ann from the Shanghai Key Laboratory of Scalable Computing and Systems at Shanghai Jiao Tong University and Queen's University Belfast. Teacher Wang Hao from Nazhou State University jointly completed it.
- Paper link: https://arxiv.org/abs/2212.01197
- Code link (including instructions for using the ALA module): https://github.com/TsingZ0/FedALA
This paper proposes an adaptive local aggregation method for federated learning to deal with the statistical heterogeneity problem in federated learning by automatically capturing the information required by the client from the global model. The author compared 11 SOTA models and achieved an excellent performance of 3.27% beyond the optimal method. The author applied the adaptive local aggregation module to other federated learning methods and achieved an improvement of up to 24.19%.
1 Introduction
Federated learning (FL) helps people fully understand and learn from each other while protecting privacy by keeping user privacy data locally without disseminating it. Uncover the value in your user data. However, since the data between clients is not visible, the statistical heterogeneity of the data (non-independent and identically distributed data (non-IID) and data volume imbalance) has become one of the huge challenges of FL. The statistical heterogeneity of data makes it difficult for traditional federated learning methods (such as FedAvg, etc.) to obtain a single global model suitable for each client through FL process training.
In recent years, personalized federated learning (pFL) methods have received increasing attention due to their ability to cope with the statistical heterogeneity of data. Unlike traditional FL, which seeks a high-quality global model, the pFL approach aims to train a personalized model suitable for each client with the collaborative computing power of federated learning. Existing pFL research on aggregating models on the server can be divided into the following three categories:
(1) Methods to learn a single global model and fine-tune it, including Per-FedAvg and FedRep;
(2) Methods for learning additional personalization models, including pFedMe and Ditto;
(3) Aggregation through personalization ( or local aggregation) methods for learning local models, including FedAMP, FedPHP, FedFomo, APPLE and PartialFed.
The pFL methods in categories (1) and (2) use all information from the global model for local initialization (referring to initializing the local model before local training at each iteration). However, in the global model, only information that improves the quality of the local model (information required by the client that meets the local training goals) is beneficial to the client. Global models generalize poorly because they contain information that is both needed and not required by a single client. Therefore, researchers propose pFL methods in category (3) to capture the information required by each client in the global model through personalized aggregation. However, the pFL methods in category (3) still exist (a) without considering the client's local training goals (such as FedAMP and FedPHP), (b) with high computational and communication costs (such as FedFomo and APPLE), (c) privacy Issues such as leakage (such as FedFomo and APPLE) and (d) mismatch between personalized aggregation and local training targets (such as PartialFed). Furthermore, since these methods make substantial modifications to the FL process, the personalized aggregation methods they use cannot be directly used in most existing FL methods.
In order to accurately capture the information required by the client from the global model without increasing the communication cost in each iteration compared to FedAvg, the author proposed a method for federation Learning Adaptive Local Aggregation Method (FedALA). As shown in Figure 1, FedALA captures the required information in the global model by aggregating the global model with the local model through the adaptive local aggregation (ALA) module before each local training. Since FedALA only uses ALA to modify the local model initialization process in each iteration compared to FedAvg without changing other FL processes, ALA can be directly applied to most other existing FL methods to improve their individuality. performance.
Figure 1: Local learning process on the client in iteration
2 Method
##2.1 Adaptive Local Aggregation (ALA)
Figure 2: Adaptive Local Aggregation (ALA) process
The adaptive local aggregation (ALA) process is shown in Figure 2. Compared with traditional federated learning, the downloaded global model is directly overwritten with the local model to obtain the local initialization model In the way (i.e. ), FedALA performs adaptive local aggregation by learning local aggregation weights for each parameter.
renew". In addition, the author implements regularization through the element-wise weight pruning method and limits the values in
to [0,1].
represents the number of neural network layers in The author initializes all the values in where Figure 3: Learning curve of client 8 on MNIST and Cifar10 datasets By choosing a smaller p value, the parameters required for training in ALA can be greatly reduced without affecting the performance of FedALA. Furthermore, as shown in Figure 3, the authors observed that once it is trained to convergence in the first training session, it does not have a great impact on the local model quality even if it is trained in subsequent iterations. That is, each client can reuse the old 2.2 ALA Analysis Without affecting the analysis, for the sake of simplicity, the author ignores The gradient term The author used ResNet-18 to compare the hyperparameters s and p on the Tiny-ImageNet data set in a practical data heterogeneous environment. The research on the impact of FedALA is shown in Table 1. For s, using more randomly sampled local training data for ALA module learning can make the personalized model perform better, but it also increases the computational cost. When using ALA, the size of s can be adjusted based on the computing power of each client. As can be seen from the table, FedALA still has outstanding performance even when using extremely small s (such as s=5). For p, different p values have almost no impact on the performance of the personalized model, but there is a huge difference in computational cost. This phenomenon also shows from one aspect the effectiveness of methods such as FedRep, which divides the model and retains the neural network layer close to the output without uploading it to the client. When using ALA, we can use a smaller and appropriate p value to further reduce the computational cost while ensuring the performance capabilities of the personalized model. Table 1: Research on hyperparameters and their impact on FedALA The author compared and analyzed FedALA with 11 SOTA methods in pathological data heterogeneous environment and practical data heterogeneous environment. As shown in Table 2, the data shows that FedALA outperforms these 11 SOTA methods in these cases, where "TINY" means using a 4-layer CNN on Tiny-ImageNet. For example, FedALA outperforms the optimal baseline by 3.27% in the TINY case. Table 2: Experimental results under pathological and real data heterogeneous environments In addition, the author The performance of FedALA was also evaluated under different heterogeneous environments and total number of clients. As shown in Table 3, FedALA still maintains excellent performance under these conditions. Table 3: Other experimental results Experiments based on Table 3 As a result, applying the ALA module to other methods can achieve up to 24.19% improvement. Finally, the author also visualized the impact of the addition of the ALA module on model training in the original FL process on MNIST, as shown in Figure 4. When ALA is not activated, the model training trajectory is consistent with using FedAvg. Once ALA is activated, the model can optimize directly toward the optimal goal with the information required for its training captured in the global model. Figure 4: Visualization of the model training trajectory on client No. 4 (or number of neural network blocks),
is consistent with the shape of the low-level network in
, and
is consistent with the rest of
The p-layer high-level network has the same shape.
to 1, and updates
based on the old
during each round of local initialization. In order to further reduce the computational cost, the author uses random sampling s
is the learning to update
Rate. In the process of learning
, the author froze other trainable parameters except
.
to capture the information it needs. The author adopts the method of fine-tuning
in subsequent iterations to reduce the computational cost.
and assumes
. According to the above formula,
can be obtained, where
represents
. Authors can think of updating
in ALA as updating
.
is scaled element by element in each round. Different from the local model training (or fine-tuning) method, the above update process of
can perceive the common information in the global model. Between different iteration rounds, the dynamically changing
introduces dynamic information into the ALA module, making it easy for FedALA to adapt to complex environments.
3 Experiment
The above is the detailed content of Surpassing SOTA by 3.27%, Shanghai Jiao Tong University and others proposed a new method of adaptive local aggregation. For more information, please follow other related articles on the PHP Chinese website!

Scientists have extensively studied human and simpler neural networks (like those in C. elegans) to understand their functionality. However, a crucial question arises: how do we adapt our own neural networks to work effectively alongside novel AI s

Google's Gemini Advanced: New Subscription Tiers on the Horizon Currently, accessing Gemini Advanced requires a $19.99/month Google One AI Premium plan. However, an Android Authority report hints at upcoming changes. Code within the latest Google P

Despite the hype surrounding advanced AI capabilities, a significant challenge lurks within enterprise AI deployments: data processing bottlenecks. While CEOs celebrate AI advancements, engineers grapple with slow query times, overloaded pipelines, a

Handling documents is no longer just about opening files in your AI projects, it’s about transforming chaos into clarity. Docs such as PDFs, PowerPoints, and Word flood our workflows in every shape and size. Retrieving structured

Harness the power of Google's Agent Development Kit (ADK) to create intelligent agents with real-world capabilities! This tutorial guides you through building conversational agents using ADK, supporting various language models like Gemini and GPT. W

summary: Small Language Model (SLM) is designed for efficiency. They are better than the Large Language Model (LLM) in resource-deficient, real-time and privacy-sensitive environments. Best for focus-based tasks, especially where domain specificity, controllability, and interpretability are more important than general knowledge or creativity. SLMs are not a replacement for LLMs, but they are ideal when precision, speed and cost-effectiveness are critical. Technology helps us achieve more with fewer resources. It has always been a promoter, not a driver. From the steam engine era to the Internet bubble era, the power of technology lies in the extent to which it helps us solve problems. Artificial intelligence (AI) and more recently generative AI are no exception

Harness the Power of Google Gemini for Computer Vision: A Comprehensive Guide Google Gemini, a leading AI chatbot, extends its capabilities beyond conversation to encompass powerful computer vision functionalities. This guide details how to utilize

The AI landscape of 2025 is electrifying with the arrival of Google's Gemini 2.0 Flash and OpenAI's o4-mini. These cutting-edge models, launched weeks apart, boast comparable advanced features and impressive benchmark scores. This in-depth compariso


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

MantisBT
Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

Atom editor mac version download
The most popular open source editor

VSCode Windows 64-bit Download
A free and powerful IDE editor launched by Microsoft

Notepad++7.3.1
Easy-to-use and free code editor

EditPlus Chinese cracked version
Small size, syntax highlighting, does not support code prompt function
