FedNKD: A Dependable Federated Learning Using Fine-tuned Random Noise and Knowledge Distillation

Zhu, Shaoxiong; Qi, Qi; Zhuang, Zirui; Wang, Jingyu; Sun, Haifeng; Liao, Jianxin

doi:10.1145/3512527.3531372

Cited by 6 publications

(4 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In [81], the clients adapt their local model by having some local parameters used for local adaption. Knowledge distillation using a teacher-student model is also a technique that can be applied on the server side [71,124] or the client side [111,134]. The regularization technique is a technique used on the client side in [55,126,138] Collaboration between clients and servers is sometimes necessary for certain techniques, particularly when it comes to data sharing.…”

Section: Discussionmentioning

confidence: 99%

Communication Efficiency and Non-Independent and Identically Distributed Data Challenge in Federated Learning: A Systematic Mapping Study

Alotaibi,

Khan,

Mahmood

2024

Applied Sciences

View full text Add to dashboard Cite

Federated learning has emerged as a promising approach for collaborative model training across distributed devices. Federated learning faces challenges such as Non-Independent and Identically Distributed (non-IID) data and communication challenges. This study aims to provide in-depth knowledge in the federated learning environment by identifying the most used techniques for overcoming non-IID data challenges and techniques that provide communication-efficient solutions in federated learning. The study highlights the most used non-IID data types, learning models, and datasets in federated learning. A systematic mapping study was performed using six digital libraries, and 193 studies were identified and analyzed after the inclusion and exclusion criteria were applied. We identified that enhancing the aggregation method and clustering are the most widely used techniques for non-IID data problems (used in 18% and 16% of the selected studies), and a quantization technique was the most common technique in studies that provide communication-efficient solutions in federated learning (used in 27% and 15% of the selected studies). Additionally, our work shows that label distribution skew is the most used case to simulate a non-IID environment, specifically, the quantity label imbalance. The supervised learning model CNN model is the most commonly used learning model, and the image datasets MNIST and Cifar-10 are the most widely used datasets when evaluating the proposed approaches. Furthermore, we believe the research community needs to consider the client’s limited resources and the importance of their updates when addressing non-IID and communication challenges to prevent the loss of valuable and unique information. The outcome of this systematic study will benefit federated learning users, researchers, and providers.

show abstract

Section: Discussionmentioning

confidence: 99%

Communication Efficiency and Non-Independent and Identically Distributed Data Challenge in Federated Learning: A Systematic Mapping Study

Alotaibi,

Khan,

Mahmood

2024

Applied Sciences

View full text Add to dashboard Cite

show abstract

“…Generative model-based methods mainly solve the knowledge fusion problem by training GAN networks to generate client data samples. For example, Zhu et al proposed the FedGEN [ 10 ] method to achieve client-side model aggregation by generating a lightweight data generator on the server side. Such as, Zhang et al proposed the FedFTG method [ 11 ] to transfer knowledge from local models to global models by exploring the input space of local models through generators.…”

Section: Related Workmentioning

confidence: 99%

“…On the other hand, we cannot share clients’ private data in the federated learning environment, so training a teacher model while preserving client privacy remains the new issue. Current research in this direction can be broadly categorized into two main approaches: the public dataset method [ 7 , 8 , 9 ] and the method based on Generative Adversarial Networks (GANs) [ 10 , 11 , 12 ]. Nevertheless, the GAN-based method requires clients to possess significant computational resources, and GAN training is a time-consuming process, limiting its accessibility to some participants.…”

Section: Introductionmentioning

confidence: 99%

FedTKD: A Trustworthy Heterogeneous Federated Learning Based on Adaptive Knowledge Distillation

Chen,

Zhang,

Dong

et al. 2024

Entropy

View full text Add to dashboard Cite

Federated learning allows multiple parties to train models while jointly protecting user privacy. However, traditional federated learning requires each client to have the same model structure to fuse the global model. In real-world scenarios, each client may need to develop personalized models based on its environment, making it difficult to perform federated learning in a heterogeneous model environment. Some knowledge distillation methods address the problem of heterogeneous model fusion to some extent. However, these methods assume that each client is trustworthy. Some clients may produce malicious or low-quality knowledge, making it difficult to aggregate trustworthy knowledge in a heterogeneous environment. To address these challenges, we propose a trustworthy heterogeneous federated learning framework (FedTKD) to achieve client identification and trustworthy knowledge fusion. Firstly, we propose a malicious client identification method based on client logit features, which can exclude malicious information in fusing global logit. Then, we propose a selectivity knowledge fusion method to achieve high-quality global logit computation. Additionally, we propose an adaptive knowledge distillation method to improve the accuracy of knowledge transfer from the server side to the client side. Finally, we design different attack and data distribution scenarios to validate our method. The experiment shows that our method outperforms the baseline methods, showing stable performance in all attack scenarios and achieving an accuracy improvement of 2% to 3% in different data distributions.

show abstract

“…• Developing FL: 11 (16%) papers were categorised as developing FL further, this included a federated clustering framework [35] and seven papers [36][37][38][39][40][41][42] where clients share synthetic data with the server, rather than model parameters/weights (this might allow quicker model training and reduce communication costs).…”

Section: Categorising the Papersmentioning

confidence: 99%

Federated learning for generating synthetic data: a scoping review

Little,

Elliot,

Allmendinger

2023

IJPDS

View full text Add to dashboard Cite

IntroductionFederated Learning (FL) is a decentralised approach to training statistical models, where training is performed across multiple clients, producing one global model. Since the training data remains with each local client and is not shared or exchanged with other clients the use of FL may reduce privacy and security risks (compared to methods where multiple data sources are pooled) and can also address data access and heterogeneity problems. Synthetic data is artificially generated data that has the same structure and statistical properties as the original but that does not contain any of the original data records, therefore minimising disclosure risk. Using FL to produce synthetic data (which we refer to as "federated synthesis") has the potential to combine data from multiple clients without compromising privacy, allowing access to data that may otherwise be inaccessible in its raw format. ObjectivesThe objective was to review current research and practices for using FL to generate synthetic data and determine the extent to which research has been undertaken, the methods and evaluation practices used, and any research gaps. MethodsA scoping review was conducted to systematically map and describe the published literature on the use of FL to generate synthetic data. Relevant studies were identified through online databases and the findings are described, grouped, and summarised. Information extracted included article characteristics, documenting the type of data that is synthesised, the model architecture and the methods (if any) used to evaluate utility and privacy risk. ResultsA total of 69 articles were included in the scoping review; all were published between 2018 and 2023 with two thirds (46) in 2022. 30% (21) were focussed on synthetic data generation as the main model output (with 6 of these generating tabular data), whereas 59% (41) focussed on data augmentation. Of the 21 performing federated synthesis, all used deep learning methods (predominantly Generative Adversarial Networks) to generate the synthetic data. ConclusionsFederated synthesis is in its early days but shows promise as a method that can construct a global synthetic dataset without sharing any of the local client data. As a field in its infancy there are areas to explore in terms of the privacy risk associated with the various methods proposed, and more generally in how we measure those risks.

show abstract

FedNKD: A Dependable Federated Learning Using Fine-tuned Random Noise and Knowledge Distillation

Cited by 6 publications

References 24 publications

Communication Efficiency and Non-Independent and Identically Distributed Data Challenge in Federated Learning: A Systematic Mapping Study

Communication Efficiency and Non-Independent and Identically Distributed Data Challenge in Federated Learning: A Systematic Mapping Study

FedTKD: A Trustworthy Heterogeneous Federated Learning Based on Adaptive Knowledge Distillation

Federated learning for generating synthetic data: a scoping review

Contact Info

Product

Resources

About