Truly Sparse Neural Networks at Scale

Curci, Selima; Mocanu, Decebal Constantin; Pechenizkiy, Mykola

doi:10.21203/rs.3.rs-133395/v1

Cited by 4 publications

(1 citation statement)

References 35 publications

(20 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Sparse neuron architectures can be achieved by other means: Mollaysa et al [2017] enforce sparsity based on the Jacobian and Li et al [2016], Lee et al [2006], Ranzato et al [2007], Collins and Kohli [2014], Ma et al [2019] employ 1 -based LASSO penalty to induce sparsity. Curci et al [2021] prune their ANNs based on a metric for neuron importance. Evci et al [2019] discuss the difficulty of training sparse ANNs.…”

Section: Introductionmentioning

confidence: 99%

A phase transition for finding needles in nonlinear haystacks with LASSO artificial neural networks

Ma¹,

Sardy²,

Hengartner³

et al. 2022

Preprint

View full text Add to dashboard Cite

To fit sparse linear associations, a LASSO sparsity inducing penalty with a single hyperparameter provably allows to recover the important features (needles) with high probability in certain regimes even if the sample size is smaller than the dimension of the input vector (haystack). More recently learners known as artificial neural networks (ANN) have shown great successes in many machine learning tasks, in particular fitting nonlinear associations. Small learning rate, stochastic gradient descent algorithm and large training set help to cope with the explosion in the number of parameters present in deep neural networks. Yet few ANN learners have been developed and studied to find needles in nonlinear haystacks. Driven by a single hyperparameter, our ANN learner, like for sparse linear associations, exhibits a phase transition in the probability of retrieving the needles, which we do not observe with other ANN learners. To select our penalty parameter, we generalize the universal threshold of Donoho and Johnstone (1994) which is a better rule than the conservative (too many false detections) and expensive cross-validation. In the spirit of simulated annealing, we propose a warm-start sparsity inducing algorithm to solve the high-dimensional, non-convex and non-differentiable optimization problem. We perform precise Monte Carlo simulations to show the effectiveness of our approach.

show abstract

Section: Introductionmentioning

confidence: 99%

A phase transition for finding needles in nonlinear haystacks with LASSO artificial neural networks

Ma¹,

Sardy²,

Hengartner³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

Stochastic spatio-temporal optimization for control and co-design of systems in robotics and applied physics

2021

View full text Add to dashboard Cite

A phase transition for finding needles in nonlinear haystacks with LASSO artificial neural networks

Sardy

Hengartner

et al. 2022

Stat Comput

View full text Add to dashboard Cite

To fit sparse linear associations, a LASSO sparsity inducing penalty with a single hyperparameter provably allows to recover the important features (needles) with high probability in certain regimes even if the sample size is smaller than the dimension of the input vector (haystack). More recently learners known as artificial neural networks (ANN) have shown great successes in many machine learning tasks, in particular fitting nonlinear associations. Small learning rate, stochastic gradient descent algorithm and large training set help to cope with the explosion in the number of parameters present in deep neural networks. Yet few ANN learners have been developed and studied to find needles in nonlinear haystacks. Driven by a single hyperparameter, our ANN learner, like for sparse linear associations, exhibits a phase transition in the probability of retrieving the needles, which we do not observe with other ANN learners. To select our penalty parameter, we generalize the universal threshold of Donoho and Johnstone (Biometrika 81(3):425–455, 1994) which is a better rule than the conservative (too many false detections) and expensive cross-validation. In the spirit of simulated annealing, we propose a warm-start sparsity inducing algorithm to solve the high-dimensional, non-convex and non-differentiable optimization problem. We perform simulated and real data Monte Carlo experiments to quantify the effectiveness of our approach.

show abstract

Truly Sparse Neural Networks at Scale

Cited by 4 publications

References 35 publications

A phase transition for finding needles in nonlinear haystacks with LASSO artificial neural networks

A phase transition for finding needles in nonlinear haystacks with LASSO artificial neural networks

Stochastic spatio-temporal optimization for control and co-design of systems in robotics and applied physics

A phase transition for finding needles in nonlinear haystacks with LASSO artificial neural networks

Contact Info

Product

Resources

About