Adaptive Communication Strategies to Achieve the Best Error-Runtime Trade-off in Local-Update SGD

Wang, Jianyu; Joshi, Gauri

doi:10.48550/arxiv.1810.08313

Cited by 22 publications

(44 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…(Communication Pattern) A number of collective communication primitives can be used for data exchange between executors [70], such as Gather, AllReduce, and ScatterReduce. (Synchronization Protocol) The iterative nature of the optimization algorithms may imply certain dependencies across successive iterations, which force synchronizations between executors at certain boundary points [94]. A synchronization protocol has to be specified regarding when such synchronizations are necessary.…”

Section: Communication Mechanismmentioning

confidence: 99%

See 1 more Smart Citation

Towards Demystifying Serverless Machine Learning Training

Jiang

Gan

Liu

et al. 2021

Proceedings of the 2021 International Conference on Management of Data

View full text Add to dashboard Cite

The appeal of serverless (FaaS) has triggered a growing interest on how to use it in data-intensive applications such as ETL, query processing, or machine learning (ML). Several systems exist for training large-scale ML models on top of serverless infrastructures (e.g., AWS Lambda) but with inconclusive results in terms of their performance and relative advantage over "serverful" infrastructures (IaaS). In this paper we present a systematic, comparative study of distributed ML training over FaaS and IaaS. We present a design space covering design choices such as optimization algorithms and synchronization protocols, and implement a platform, LambdaML, that enables a fair comparison between FaaS and IaaS. We present experimental results using LambdaML, and further develop an analytic model to capture cost/performance tradeoffs that must be considered when opting for a serverless infrastructure. Our results indicate that ML training pays off in serverless only for models with efficient (i.e., reduced) communication and that quickly converge. In general, FaaS can be much faster but it is never significantly cheaper than IaaS.

show abstract

Section: Communication Mechanismmentioning

confidence: 99%

“…We have also used data parallelism to implement LambdaML. Other research topics in distributed ML include compression [6,7,52,53,93,96,97,101], decentralization [28,41,59,65,90,91,100], synchronization [4,19,26,46,66,68,87,94,102], straggler [8,56,83,89,98,105], data partition [1,3,36,55,77], etc.…”

Section: Related Workmentioning

confidence: 99%

Towards Demystifying Serverless Machine Learning Training

Jiang

Gan

Liu

et al. 2021

Proceedings of the 2021 International Conference on Management of Data

View full text Add to dashboard Cite

show abstract

“…• Fast aggregation via over-the-air computation [21], [84], [85], [86] • Aggregation frequency control with limited bandwidth and computation resources [87], [88], [89] • Data reshuffling via index coding and pliable index coding for improving training performance [90], [91], [92] • Straggler mitigation via coded computing [93], [94], [95], [96], [97], [98], [99], [100], [101] • Training in decentralized system mode [102], [103], [104], [105], [106], [107], [108], [109], [110], [111], [112] Model Partition Based Edge Training Systems…”

Section: Data Partition Based Edge Training Systemsmentioning

confidence: 99%

Communication-Efficient Edge AI: Algorithms and Systems

Shi

Yang

Jiang

et al. 2020

Preprint

View full text Add to dashboard Cite

Artificial intelligence (AI) has achieved remarkable breakthroughs in a wide range of fields, ranging from speech processing, image classification to drug discovery. This is driven by the explosive growth of data, advances in machine learning (especially deep learning), and easy access to vastly powerful computing resources. Particularly, the wide scale deployment of edge devices (e.g., IoT devices) generates an unprecedented scale of data, which provides the opportunity to derive accurate models and develop various intelligent applications at the network edge. However, such enormous data cannot all be sent from end devices to the cloud for processing, due to the varying channel quality, traffic congestion and/or privacy concerns. By pushing inference and training processes of AI models to edge nodes, edge AI has emerged as a promising alternative. AI at the edge requires close cooperation among edge devices, such as smart phones and smart vehicles, and edge servers at the wireless access points and base stations, which however result in heavy communication overheads. In this paper, we present a comprehensive survey of the recent developments in various techniques for overcoming these communication challenges. Specifically, we first identify key communication challenges in edge AI systems. We then introduce communication-efficient techniques, from both algorithmic and system perspectives for training and inference tasks at the network edge. Potential future research directions are also highlighted.

show abstract

“…s=rτ +1 ∇L(Θ s ) when 0 ≥ r < j and Q r := rτ +i−1 s=rτ +1 ∇L(Θ s ) when r = j. Then, according to Equation (88) in [33], we have…”

Section: Convergence Analysis Of Dp-pasgdmentioning

confidence: 99%

Differentially Private Federated Learning for Resource-Constrained Internet of Things

Hu,

Guo,

Ratazzi

et al. 2020

Preprint

View full text Add to dashboard Cite

With the proliferation of smart devices having built-in sensors, Internet connectivity, and programmable computation capability in the era of Internet of things (IoT), tremendous data is being generated at the network edge. Federated learning is capable of analyzing the large amount of data from a distributed set of smart devices without requiring them to upload their data to a central place. However, the commonly-used federated learning algorithm is based on stochastic gradient descent (SGD) and not suitable for resource-constrained IoT environments due to its high communication resource requirement. Moreover, the privacy of sensitive data on smart devices has become a key concern and needs to be protected rigorously. This paper proposes a novel federated learning framework called DP-PASGD for training a machine learning model efficiently from the data stored across resource-constrained smart devices in IoT while guaranteeing differential privacy. The optimal schematic design of DP-PASGD that maximizes the learning performance while satisfying the limits on resource cost and privacy loss is formulated as an optimization problem, and an approximate solution method based on the convergence analysis of DP-PASGD is developed to solve the optimization problem efficiently. Numerical results based on real-world datasets verify the effectiveness of the proposed DP-PASGD scheme.

show abstract

Adaptive Communication Strategies to Achieve the Best Error-Runtime Trade-off in Local-Update SGD

Cited by 22 publications

References 24 publications

Towards Demystifying Serverless Machine Learning Training

Towards Demystifying Serverless Machine Learning Training

Communication-Efficient Edge AI: Algorithms and Systems

Differentially Private Federated Learning for Resource-Constrained Internet of Things

Contact Info

Product

Resources

About