Decentralized Federated Learning Preserves Model and Data Privacy

Wittkopp, Thorsten; Acker, Alexander

doi:10.1007/978-3-030-76352-7_20

Cited by 16 publications

(7 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Decentralized Federated Learning Preserves Model and Data Privacy Wittkopp and Acker [21] propose a decentralized federated learning approach for sharing knowledge between different IT-services in a privacy-aware procedure. The evaluation shows improvements for log-data anomaly detection when training DeepLog models with a teacher-student approach without sharing directly training data or any model parameters.…”

Section: Other Topicsmentioning

confidence: 99%

Artificial Intelligence for IT Operations (AIOPS) Workshop White Paper

Bogatinovski,

Nedelkoski,

Acker

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Section: Other Topicsmentioning

confidence: 99%

Artificial Intelligence for IT Operations (AIOPS) Workshop White Paper

Bogatinovski,

Nedelkoski,

Acker

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

“…In Federated Learning, multiple parties train a shared model on decentralized privacy-sensitive data that cannot be shared between devices [59]. For that reason, federated learning algorithms prioritize data privacy over training efficiency, often leaving most of the compute resources unused [60,61]. For a more detailed overview of Federated Learning, refer to Appendix A.…”

Section: Distributed Trainingmentioning

confidence: 99%

“…Maintaining data privacy in these conditions also requires specialized techniques that introduce communication overhead. For instance, [61] proposes a system where workers cannot share parameters directly, relying on a secure peer-to-peer knowledge distillation instead.…”

Section: A Federated Learningmentioning

confidence: 99%

Distributed Deep Learning in Open Collaborations

Diskin¹,

Bukhtiyarov²,

Ryabinin³

et al. 2021

Preprint

View full text Add to dashboard Cite

Modern deep learning applications require increasingly more compute to train state-of-the-art models. To address this demand, large corporations and institutions use dedicated High-Performance Computing clusters, whose construction and maintenance are both environmentally costly and well beyond the budget of most organizations. As a result, some research directions become the exclusive domain of a few large industrial and even fewer academic actors. To alleviate this disparity, smaller groups may pool their computational resources and run collaborative experiments that benefit all participants. This paradigm, known as grid-or volunteer computing, has seen successful applications in numerous scientific areas. However, using this approach for machine learning is difficult due to high latency, asymmetric bandwidth, and several challenges unique to volunteer computing. In this work, we carefully analyze these constraints and propose a novel algorithmic framework designed specifically for collaborative training. We demonstrate the effectiveness of our approach for SwAV and ALBERT pretraining in realistic conditions and achieve performance comparable to traditional setups at a fraction of the cost. Finally, we provide a detailed report of successful collaborative language model pretraining with 40 participants.

show abstract

“…The second considerable issue is privacy. While there are attempts at preserving data privacy in federated learning [12], [13], researchers repeatedly find vulnerabilities and manage to reconstruct the original data [14], [15]. These considerations were taken into account in our previous work, C3O, where users have full control over their data as they make contributions to the public training data repository on an entirely voluntary basis.…”

Section: Introductionmentioning

confidence: 99%

Training Data Reduction for Performance Models of Data Analytics Jobs in the Cloud

Will

Arslan

Bader

et al. 2021

2021 IEEE International Conference on Big Data (Big Data)

View full text Add to dashboard Cite

Distributed dataflow systems like Apache Flink and Apache Spark simplify processing large amounts of data on clusters in a data-parallel manner. However, choosing suitable cluster resources for distributed dataflow jobs in both type and number is difficult, especially for users who do not have access to previous performance metrics. One approach to overcoming this issue is to have users share runtime metrics to train context-aware performance models that help find a suitable configuration for the job at hand. A problem when sharing runtime data instead of trained models or model parameters is that the data size can grow substantially over time.This paper examines several clustering techniques to minimize training data size while keeping the associated performance models accurate. Our results indicate that efficiency gains in data transfer, storage, and model training can be achieved through training data reduction. In the evaluation of our solution on a dataset of runtime data from 930 unique distributed dataflow jobs, we observed that, on average, a 75% data reduction only increases prediction errors by one percentage point.

show abstract

Decentralized Federated Learning Preserves Model and Data Privacy

Cited by 16 publications

References 24 publications

Artificial Intelligence for IT Operations (AIOPS) Workshop White Paper

Artificial Intelligence for IT Operations (AIOPS) Workshop White Paper

Distributed Deep Learning in Open Collaborations

Training Data Reduction for Performance Models of Data Analytics Jobs in the Cloud

Contact Info

Product

Resources

About