2022
DOI: 10.1002/cpe.7272
|View full text |Cite
|
Sign up to set email alerts
|

Large‐scale knowledge distillation with elastic heterogeneous computing resources

Abstract: Although more layers and more parameters generally improve the accuracy of the models, such big models generally have high computational complexity and require big memory, which exceed the capacity of small devices for inference and incurs long training time. In addition, it is difficult to afford long training time and inference time of big models even in high performance servers, as well. As an efficient approach to compress a large deep model (a teacher model) to a compact model (a student model), knowledge… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
3
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
1

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(3 citation statements)
references
References 32 publications
0
3
0
Order By: Relevance
“…FL only allows the intermediate data to be transferred from the distributed devices, which can be the weights or the gradients of a model. FL generally utilizes a parameter server architecture [9], [10], [11], where a server (or a group of servers) coordinates the training process with numerous devices. To collaboratively train a global model, the server selects (schedules) several devices to perform local model updates based on their local data, and then it aggregates the local models to obtain a new global model.…”
Section: Introductionmentioning
confidence: 99%
“…FL only allows the intermediate data to be transferred from the distributed devices, which can be the weights or the gradients of a model. FL generally utilizes a parameter server architecture [9], [10], [11], where a server (or a group of servers) coordinates the training process with numerous devices. To collaboratively train a global model, the server selects (schedules) several devices to perform local model updates based on their local data, and then it aggregates the local models to obtain a new global model.…”
Section: Introductionmentioning
confidence: 99%
“…We apply the parameter server and peer‐to‐peer communication architecture to accelerate the big data processing. The proposed architecture can be deployed in a large‐scale cluster environment environment or a cloud environment, 26 where multiple powerful servers can be offered for parallel computing 27 …”
Section: Introductionmentioning
confidence: 99%
“…The proposed architecture can be deployed in a large-scale cluster environment environment or a cloud environment, 26 where multiple powerful servers can be offered for parallel computing. 27 We summarize our main contributions of this article as follows:…”
mentioning
confidence: 99%