LSDDL: Layer-Wise Sparsification for Distributed Deep Learning

Hong, Yuxi; Han, Peng

doi:10.1016/j.bdr.2021.100272

Cited by 3 publications

(1 citation statement)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Thus, compression operators should be designed to specifically operate on these layers so as to not impact the final model performance. Layer-wise compression methods have been introduced relying on sparsification [40], [42] and randomized selection [41], [43]. However, the former requires repeating the sparsification operations on all layers, increasing the computational overhead as the number of layers grows, while the latter cannot capture any interrelation among layer parameters as it uses a simple randomized selection.…”

Section: A Related Work and Motivationsmentioning

confidence: 99%

A Layer Selection Optimizer for Communication-Efficient Decentralized Federated Deep Learning

2023

View full text Add to dashboard Cite

Federated Learning (FL) systems orchestrate the cooperative training of a shared machine learning (ML) model across connected devices. Recently, decentralized FL architectures driven by consensus have been proposed to enable the devices to share and aggregate the ML parameters via direct sidelink communications. The approach has the advantage of promoting the federation among the agents even in the absence of a server, but may require an intensive use of communication resources compared to vanilla FL methods. This paper proposes a communication-efficient design of consensus-driven FL optimized for training of deep neural networks (DNNs). Devices independently select fragments of the DNN to be shared with neighbors on each training round. Selection is based on a local optimizer that trades model quality improvement with sidelink communication resource savings. The proposed technique is validated on a vehicular cooperative sensing use case characterized by challenging real-world datasets and complex DNNs typically employed in autonomous driving with up to 40 trainable layers. The impact of layer selection is analyzed under different distributed coordination configurations. The results show that it is better to prioritize the DNN layers possessing few parameters, while the selection policy should optimally balance gradient sorting and randomization. Latency, accuracy and communication tradeoffs are analyzed in detail targeting sustainable federation policies.

show abstract