2021
DOI: 10.14778/3447689.3447691
|View full text |Cite
|
Sign up to set email alerts
|

Errata for "Cerebro: a data system for optimized deep learning model selection"

Abstract: We discovered that there was an inconsistency in the communication cost formulation for the decentralized fine-grained training method in Table 2 of our paper [1]. We used Horovod as the archetype for decentralized fine-grained approaches, and its correct communication cost is higher than what we had reported. So, we amend the communication cost of decentralized fine-grained to [EQUATION]

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
22
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 15 publications
(22 citation statements)
references
References 1 publication
0
22
0
Order By: Relevance
“…Coordinating HP search jobs must be done carefully to ensure that: each job processes the entire dataset exactly once per epoch. A naive way of doing this is to pre-process the dataset once and reuse across all HP jobs and all epochs as suggested by prior work [58,72]. This approach will not work for two reasons.…”
Section: Coordinated Prepmentioning
confidence: 99%
See 1 more Smart Citation
“…Coordinating HP search jobs must be done carefully to ensure that: each job processes the entire dataset exactly once per epoch. A naive way of doing this is to pre-process the dataset once and reuse across all HP jobs and all epochs as suggested by prior work [58,72]. This approach will not work for two reasons.…”
Section: Coordinated Prepmentioning
confidence: 99%
“…This work therefore fundamentally improves performance on top of what Quiver can achieve. While prior work like Cerebro [72], and DeepIO [103] have looked at optimizing data fetch in distributed training, they do not systematically analyze data stalls in different training scenarios, or demonstrate how to accelerate single-server training.…”
Section: Introductionmentioning
confidence: 99%
“…In addition, since convolutional layers are the most difficult type of layers for efficient recomputation, our solution focuses on convolutional layers to provide high compression ratios with minimum performance overheads and accuracy losses. We also note that COMET can further improve the training performance by combining with model parallelism techniques such as Cerebro [39], which is designed for efficiently training multiple model configurations to select the best model configuration.…”
Section: Training Large-scale Dnnsmentioning
confidence: 99%
“…Moving data between workers incurs expensive network traffic and is not viable. Instead, existing solutions use different learning rates across workers [24] or train multiple models concurrently and choose the one that provides the highest accuracy [36]. CROSSBOW [27] considers heterogeneity in the context of multiple GPUs by scheduling a different number of processing streams -learners -on every GPU.…”
Section: Elastic Distributed Trainingmentioning
confidence: 99%