2017
DOI: 10.1016/j.neucom.2017.02.029
|View full text |Cite
|
Sign up to set email alerts
|

Group sparse regularization for deep neural networks

Abstract: In this paper, we address the challenging task of simultaneously optimizing (i) the weights of a neural network, (ii) the number of neurons for each hidden layer, and (iii) the subset of active input features (i.e., feature selection). While these problems are traditionally dealt with separately, we propose an efficient regularized formulation enabling their simultaneous parallel execution, using standard optimization routines. Specifically, we extend the group Lasso penalty, originally proposed in the linear … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
267
0
1

Year Published

2017
2017
2024
2024

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 396 publications
(268 citation statements)
references
References 38 publications
0
267
0
1
Order By: Relevance
“…The most important change of our training method is the application of L1 regularization to the weights [23]. While L1 regularization is primarily a way to avoid overfitting, it has a desirable side effect: sparsifying the network's weight matrices.…”
Section: The Training Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…The most important change of our training method is the application of L1 regularization to the weights [23]. While L1 regularization is primarily a way to avoid overfitting, it has a desirable side effect: sparsifying the network's weight matrices.…”
Section: The Training Methodsmentioning
confidence: 99%
“…While L1 regularization is primarily a way to avoid overfitting, it has a desirable side effect: sparsifying the network's weight matrices. While most state-of-the-art methods of pruning devote considerable computational expense to find the least influential weights, using L1 regularization ensures that the majority of the weights are already effectively zero and can be pruned without affecting the network at all [23].…”
Section: The Training Methodsmentioning
confidence: 99%
“…Other work focusing on the sparsity of parameters in DNN is also related to this article. In [22], group Lasso penalty is adopted to impose group-level sparsity on networks connections. But this work can hardly satisfy our standardization constraints, i.e.…”
Section: Relation With Previous Workmentioning
confidence: 99%
“…In this case, a pruned model can take advantage of dense matrix computation just as the original model does. L 1 norm regularization is commonly used to enforce individual parameter sparsity, and Group Lasso [39] is normally used to enforce structural parameter sparsity [32].…”
Section: Related Workmentioning
confidence: 99%