Proceedings of the 21st ACM Workshop on Hot Topics in Networks 2022
DOI: 10.1145/3563766.3564115
|View full text |Cite
|
Sign up to set email alerts
|

Congestion control in machine learning clusters

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
0
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
1
1

Relationship

0
5

Authors

Journals

citations
Cited by 15 publications
(1 citation statement)
references
References 37 publications
0
0
0
Order By: Relevance
“…Saba requires no modification to applications and no prior knowledge of flow size. Rajasekaran et al observe that unfairly (unequally) sharing the network among ML jobs could lead to shorter training time due to the on-off pattern of DNN training [49]. In contrast, Saba does not make any assumption about the specific pattern of communication and computation, and proposes a general methodology to allocate bandwidth across a heterogeneous set of workloads.…”
Section: Related Workmentioning
confidence: 99%
“…Saba requires no modification to applications and no prior knowledge of flow size. Rajasekaran et al observe that unfairly (unequally) sharing the network among ML jobs could lead to shorter training time due to the on-off pattern of DNN training [49]. In contrast, Saba does not make any assumption about the specific pattern of communication and computation, and proposes a general methodology to allocate bandwidth across a heterogeneous set of workloads.…”
Section: Related Workmentioning
confidence: 99%