Gradient Flow in Sparse Neural Networks and How Lottery Tickets Win

Evci, Utku; Ioannou, Yani; Keskin, Cem; Dauphin, Yann N.

doi:10.48550/arxiv.2010.03533

Cited by 8 publications

(21 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We show that by using a training method that is stable with respect to linear mode connectivity, large networks can also be entirely rewound to initialization. Our subsequent experiments on common vision tasks give strong credence to the hypothesis in Evci et al (2020b) that lottery tickets simply retrain to the same regions (although not necessarily to the same basin). These results imply that existing lottery tickets could not have been found without the preceding dense training by iterative magnitude pruning, raising doubts about the use of the lottery ticket hypothesis.…”

supporting

confidence: 56%

“…IMP Interpretation. Evci et al (2020b) recently proposed a possible interpretation for the behavior of IMP and the success of the LTH. The authors posit that lottery tickets cannot be considered random initializations but that a lottery ticket contains a prior for rediscovering the solution of the model from which it was pruned.…”

Section: Introductionmentioning

confidence: 99%

“…We call this hypothesis the regurgitating tickets interpretation (RTI): a lottery ticket retrains to a similar optimum compared to the network from which it was pruned. As discussed in Evci et al (2020b), the RTI allows us to informally understand several key questions regarding IMP, such as why lottery tickets transfer or why they are robust to perturbations.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Towards Understanding Iterative Magnitude Pruning: Why Lottery Tickets Win

Maene,

Li,

Moens

2021

Preprint

View full text Add to dashboard Cite

The lottery ticket hypothesis states that sparse subnetworks exist in randomly initialized dense networks that can be trained to the same accuracy as the dense network they reside in. However, the subsequent work has failed to replicate this on large-scale models and required rewinding to an early stable state instead of initialization. We show that by using a training method that is stable with respect to linear mode connectivity, large networks can also be entirely rewound to initialization. Our subsequent experiments on common vision tasks give strong credence to the hypothesis in Evci et al. (2020b) that lottery tickets simply retrain to the same regions (although not necessarily to the same basin). These results imply that existing lottery tickets could not have been found without the preceding dense training by iterative magnitude pruning, raising doubts about the use of the lottery ticket hypothesis. 1

show abstract

supporting

confidence: 56%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Towards Understanding Iterative Magnitude Pruning: Why Lottery Tickets Win

Maene,

Li,

Moens

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…They also allow reallocation of the weights across the layers. In [11,7,25,12], the gradient information is used to determine which connections would be changed during the evolution phase.…”

Section: Related Workmentioning

confidence: 99%

Dynamic Sparse Training for Deep Reinforcement Learning

Sokar¹,

Elena²,

Mocanu³

et al. 2021

Preprint

View full text Add to dashboard Cite

Deep reinforcement learning has achieved significant success in many decisionmaking tasks in various fields. However, it requires a large training time of dense neural networks to obtain a good performance. This hinders its applicability on lowresource devices where memory and computation are strictly constrained. In a step towards enabling deep reinforcement learning agents to be applied to low-resource devices, in this work, we propose for the first time to dynamically train deep reinforcement learning agents with sparse neural networks from scratch. We adopt the evolution principles of dynamic sparse training in the reinforcement learning paradigm and introduce a training algorithm that optimizes the sparse topology and the weight values jointly to dynamically fit the incoming data. Our approach is easy to be integrated into existing deep reinforcement learning algorithms and has many favorable advantages. First, it allows for significant compression of the network size which reduces the memory and computation costs substantially. This would accelerate not only the agent inference but also its training process. Second, it speeds up the agent learning process and allows for reducing the number of required training steps. Third, it can achieve higher performance than training the dense counterpart network. We evaluate our approach on OpenAI gym continuous control tasks 1 . The experimental results show the effectiveness of our approach in achieving higher performance than one of the state-of-art baselines with a 50% reduction in the network size and floating-point operations (FLOPs). Moreover, our proposed approach can reach the same performance achieved by the dense network with a 40-50% reduction in the number of training steps.

show abstract

“…Pruning. Pruning strategies for neural networks can be roughly separated into structured (Han et al, 2016;Li et al, 2016;Liu et al, 2017; and unstructured (Evci et al, 2019(Evci et al, , 2020Frankle and Carbin, 2018;Han et al, 2015) variants. Structured pruning, as considered in this work, prunes parameter groups instead of individual weights, allowing speedups to be achieved without sparse computation (Li et al, 2016).…”

Section: Related Workmentioning

confidence: 99%

i-SpaSP: Structured Neural Pruning via Sparse Signal Recovery

Wolfe¹,

Kyrillidis²

2021

Preprint

View full text Add to dashboard Cite

We propose a novel, structured pruning algorithm for neural networks-the iterative, Sparse Structured Pruning algorithm, dubbed as i-SpaSP. Inspired by ideas from sparse signal recovery, i-SpaSP operates by iteratively identifying a larger set of important parameter groups (e.g., filters or neurons) within a network that contribute most to the residual between pruned and dense network output, then thresholding these groups based on a smaller, pre-defined pruning ratio. For both two-layer and multi-layer network architectures with ReLU activations, we show the error induced by pruning with i-SpaSP decays polynomially, where the degree of this polynomial becomes arbitrarily large based on the sparsity of the dense network's hidden representations. In our experiments, i-SpaSP is evaluated across a variety of datasets (i.e., MNIST and ImageNet) and architectures (i.e., feed forward networks, ResNet34, and MobileNetV2), where it is shown to discover high-performing sub-networks and improve upon the pruning efficiency of provable baseline methodologies by several orders of magnitude. Put simply, i-SpaSP is easy to implement with automatic differentiation, achieves strong empirical results, comes with theoretical convergence guarantees, and is efficient, thus distinguishing itself as one of the few computationally efficient, practical, and provable pruning algorithms.

show abstract

Gradient Flow in Sparse Neural Networks and How Lottery Tickets Win

Cited by 8 publications

References 13 publications

Towards Understanding Iterative Magnitude Pruning: Why Lottery Tickets Win

Towards Understanding Iterative Magnitude Pruning: Why Lottery Tickets Win

Dynamic Sparse Training for Deep Reinforcement Learning

i-SpaSP: Structured Neural Pruning via Sparse Signal Recovery

Contact Info

Product

Resources

About