2019
DOI: 10.48550/arxiv.1903.06701
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Scaling Distributed Machine Learning with In-Network Aggregation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
23
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 16 publications
(23 citation statements)
references
References 0 publications
0
23
0
Order By: Relevance
“…However, due to the locality of this observation, the collected information can not be leveraged for strategic decisions by itself. This information needs to be collected and somehow integrated in more comprehensive analysis in order to make strategic decisions based on a global view of the acquired information [8], [11].…”
Section: Discussion and Insightsmentioning
confidence: 99%
See 1 more Smart Citation
“…However, due to the locality of this observation, the collected information can not be leveraged for strategic decisions by itself. This information needs to be collected and somehow integrated in more comprehensive analysis in order to make strategic decisions based on a global view of the acquired information [8], [11].…”
Section: Discussion and Insightsmentioning
confidence: 99%
“…These headers enable differentiated processing and information gathering within FDs, or applying custom hash functions in order to distinguish specifics sets of packets [7]. On the other hand, data aggregation helps to reduce the amount of data that needs to be transmitted in the network from FDs to Control Plane, for the execution of complex operations there [8].…”
Section: Machine Learning At the Programmable Data Plane For Improved...mentioning
confidence: 99%
“…Besides gradient compression, there are other application-layer and systemlayer optimizations. For example, ByteScheduler [4] orders the gradient transmission of different layers to better overlap with forward computation; and SwitchML [6] uses a programmable switch to aggregate gradients and reduce the communication size. These proposals all suggest significant reduction on the training time.…”
Section: Discussion and Future Workmentioning
confidence: 99%
“…In response to this, there has been a surge of research from machine learning and systems communities on improving the communication efficiency of distributed training in recent years [4][5][6][7][8][9][10][11][12][13][14][15][16]. These works are primarily done at the application layer, assuming that the network has done its best to maximize communication efficiency.…”
Section: Introductionmentioning
confidence: 99%
“…However, performing gradient compression to reduce the communicated data size is not free. Some recent works (Xu et al, 2020;Sapio et al, 2019;Li et al, 2018b;Gupta et al, 2020) noticed that gradient compression harms the scalability of distributed training in some cases and suggested that these compression techniques are only beneficial for training over slow networks (Lim et al, 2018).…”
Section: Introductionmentioning
confidence: 99%