2021
DOI: 10.48550/arxiv.2101.09048
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Selfish Sparse RNN Training

Abstract: Sparse neural networks have been widely applied to reduce the necessary resource requirements to train and deploy over-parameterized deep neural networks. For inference acceleration, methods that induce sparsity from a pre-trained dense network (dense-to-sparse training) work effectively. Recently, dynamic sparse training (DST) has been proposed to train sparse neural networks without pre-training a dense network (sparse-to-sparse training), so that the training process can also be accelerated. However, previo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6

Relationship

1
5

Authors

Journals

citations
Cited by 6 publications
(5 citation statements)
references
References 32 publications
0
5
0
Order By: Relevance
“…[23,24] first introduced the Sparse Evolutionary Training (SET) technique [23], reaching superior performance compared to training with fixed sparse connectivity [72,27]. [28][29][30] leverages "weight reallocation" to improve performance of obtained sparse subnetworks. Furthermore, gradient information from the backward pass is utilized to guide the update of the dynamic sparse connectivity [29,25], which produces substantial performance gains.…”
Section: Related Workmentioning
confidence: 99%
“…[23,24] first introduced the Sparse Evolutionary Training (SET) technique [23], reaching superior performance compared to training with fixed sparse connectivity [72,27]. [28][29][30] leverages "weight reallocation" to improve performance of obtained sparse subnetworks. Furthermore, gradient information from the backward pass is utilized to guide the update of the dynamic sparse connectivity [29,25], which produces substantial performance gains.…”
Section: Related Workmentioning
confidence: 99%
“…DST was first proposed in (Mocanu et al, 2018). Following works improve DST by parameter redistribution (Mostafa & Wang, 2019;Liu et al, 2021a) and gradient-based methods (Dettmers & Zettlemoyer, 2019;Evci et al, 2020). A recent work (Liu et al, 2021b) suggested that successful DST needed to explore the training of possible connections sufficiently.…”
Section: Pruning In the Early Training Stagementioning
confidence: 99%
“…These methods are all classified as dense-to-sparse training as they start from a dense network. Dynamic Sparse Training (DST) [43,3,47,8,9,35,34,25] is another class of methods that prune models during training. The key factor of DST is that it starts from a random initialized sparse network and optimizes the sparse topology as well as the weights simultaneously during training (sparse-to-sparse training).…”
Section: Related Workmentioning
confidence: 99%
“…We consequently propose a parameter-efficient method to regenerate new connections during the gradual pruning process. Different from the existing works for pruning understanding which mainly focus on dense-to-sparse training [41] (training a dense model and prune it to the target sparsity), we also consider sparse-to-sparse training (training a sparse model yet adaptively re-creating the sparsity pattern) which recently has received an upsurge of interest in machine learning [43,3,9,47,8,36,35].…”
Section: Introductionmentioning
confidence: 99%