2022
DOI: 10.48550/arxiv.2212.04129
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Deep Model Assembling

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 0 publications
0
2
0
Order By: Relevance
“…Decoupled Learning breaks down the end-to-end optimization problem of neural network training into smaller subproblems. This is achieved through various techniques such as the use of auxiliary variables (Askari et al, 2018;Li et al, 2019;Taylor et al, 2016;Zhang & Brand, 2017), delayed gradient descent (Huo et al, 2018;Xu et al, 2020), and model assembly (Ni et al, 2022). However, current decoupled learning methods have primarily been developed for training neural networks from scratch and have not yet been extensively explored for fine-tuning already trained large foundation models.…”
Section: Related Workmentioning
confidence: 99%
“…Decoupled Learning breaks down the end-to-end optimization problem of neural network training into smaller subproblems. This is achieved through various techniques such as the use of auxiliary variables (Askari et al, 2018;Li et al, 2019;Taylor et al, 2016;Zhang & Brand, 2017), delayed gradient descent (Huo et al, 2018;Xu et al, 2020), and model assembly (Ni et al, 2022). However, current decoupled learning methods have primarily been developed for training neural networks from scratch and have not yet been extensively explored for fine-tuning already trained large foundation models.…”
Section: Related Workmentioning
confidence: 99%
“…Some existing work on the cloud-based distributed learning studied model parallelism methods. Ni et al [33] proposed to divide the model horizontally into submodels, train them in parallel, and link them for final finetuning. Krizhevsky et al [26] proposed group convolution to facilitate model parallelism, which splits the channels of a convolutional layer into multiple groups and performed convolution within each group, thus decoupling their forward processes.…”
Section: Decoupled Structurementioning
confidence: 99%