2020 International Joint Conference on Neural Networks (IJCNN) 2020
DOI: 10.1109/ijcnn48605.2020.9207451
|View full text |Cite
|
Sign up to set email alerts
|

Energy-efficient and Robust Cumulative Training with Net2Net Transformation

Abstract: Deep learning has achieved state-of-the-art accuracies on several computer vision tasks. However, the computational and energy requirements associated with training such deep neural networks can be quite high. In this paper, we propose a cumulative training strategy with Net2Net transformation that achieves training computational efficiency without incurring large accuracy loss, in comparison to a model trained from scratch. We achieve this by first training a small network (with lesser parameters) on a small … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
5
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
2
2

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(5 citation statements)
references
References 14 publications
0
5
0
Order By: Relevance
“…Most research works that investigate efficient training via neural growth focus on how to initialize new neurons/layers. (Aosong and Panda 2020;Li et al 2022;Dong et al 2020). An early study on new neuron initialization employs the random initialization (Istrate et al 2018).…”
Section: Related Work Training Acceleration Via Neural Growthmentioning
confidence: 99%
“…Most research works that investigate efficient training via neural growth focus on how to initialize new neurons/layers. (Aosong and Panda 2020;Li et al 2022;Dong et al 2020). An early study on new neuron initialization employs the random initialization (Istrate et al 2018).…”
Section: Related Work Training Acceleration Via Neural Growthmentioning
confidence: 99%
“…However, Net2Net randomly selects neurons to be split, and subsequent work [25] addresses this issue by employing a functional steepest-descent approach to determine the optimal subset of neurons for splitting. The pruning technique [26] has also been employed for reusable neural networks [27]. In addition to [27], another notable study introduces the concept of hierarchical pre-training.…”
Section: Related Workmentioning
confidence: 99%
“…The pruning technique [26] has also been employed for reusable neural networks [27]. In addition to [27], another notable study introduces the concept of hierarchical pre-training. This approach effectively reduces both the time required for pre-training and enhances overall performance by leveraging an already pre-trained vision model as an initialization step in the pre-training process.…”
Section: Related Workmentioning
confidence: 99%
“…To handle this problem, some works (Wu et al, , 2020bWang et al, 2019b;Wu et al, 2020a) leverage a functional steepest descent idea to decide the optimal subset of neurons to be split. The pruning technique (Han et al, 2015) is also introduced for reusable neural networks (Feng and Panda, 2020). Recently, hierarchical pre-training is proposed by Feng and Panda (2020), which saves training time and improves performance by initializing the pretraining process with an existing pre-trained vision model.…”
Section: Related Workmentioning
confidence: 99%
“…The pruning technique (Han et al, 2015) is also introduced for reusable neural networks (Feng and Panda, 2020). Recently, hierarchical pre-training is proposed by Feng and Panda (2020), which saves training time and improves performance by initializing the pretraining process with an existing pre-trained vision model. In this paper, we study the reusable pre-trained language model and propose a new method, bert2BERT to accelerate the pre-training of BERT and GPT.…”
Section: Related Workmentioning
confidence: 99%