Federated learning (FL) is widely used in the Internet of Things (IoT), wireless networks, mobile devices, autonomous vehicles, and human activity due to its excellent potential in cybersecurity and privacy security. Though FL method can achieve privacy-safe and reliable collaborative training without collecting users' privacy data, it suffers from many challenges during both training and deployment. The main challenges in FL are the difficulty of non-i.i.d co-training data caused by the statistical diversity of the data from various participants, and the difficulty of application deployment caused by the excessive traffic volume and long communication delay between the central server and the client. To address these problems, we propose a sparse FL scheme with hierarchical personalization models (sFedHP), which minimizes clients' loss functions including the properties of an approximated 1 -norm and the hierarchical proximal mapping, to reduce the communicational and computational loads required in the network, while improving the performance on statistical diversity data. Convergence analysis shows that the sparse constraint in sFedHP only reduces the convergence speed to a small extent, while the communication cost is greatly reduced. Experimentally, we demonstrate the benefits of this sparse hierarchical personalization architecture compared with the client-edge-cloud hierarchical FedAvg and the state-of-the-art personalization methods.
IntroductionMachine learning methods have grown rapidly in a wide range of applications thanks to a large number of labeled training samples [1]. Typically, these samples collected on users' devices, such as mobile phones, are expected to send to a centralized server with powerful computing power to train a deep model [2]. However, users are often reluctant to share personal data due to privacy and security concerns, which motivates the emergence of federated learning (FL) [3]. Federated Averaging (FedAvg) [3] is known as the first FL algorithms to build a global-model in different clients while protecting their personal data locally. It enables secure co-training meeting the requirements for privacy and security concerns, by collecting clients' trained models, instead of local data, to a centralized server to generate the global model. And FL has been used in the Internet of Things (IoT), wireless networks, mobile devices, autonomous vehicles, and human activity recognition gradually [4][5][6][7][8], for its excellent potential in cybersecurity and privacy security.However, the frequency communication between clients and the server is usually required in FL to ensure convergence performance, which has high latency and limited communication bandwidth. Therefore, effective communication methods must be used, such as sparse optimizers [9][10][11]. Afterward, one-shot FL [12] enables a central server learns a global-model in a single round of communication. Quantization methods [13,14] and multiple local optimization rounds [15,16] have been utilized to address the limitations on...