2020
DOI: 10.48550/arxiv.2008.02965
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Improve Generalization and Robustness of Neural Networks via Weight Scale Shifting Invariant Regularizations

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
1

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(5 citation statements)
references
References 12 publications
0
5
0
Order By: Relevance
“…The l 2 regularization [30] is the most common method, which essentially incorporates the l 2 regularization term into the function structure while deriving the loss function. The l 1 regularization [31] is a type of Laplace prior distribution regularization, which adds a l 1 regularization term to the loss function of the fitting function so that some of the coefficients of the independent variables of the fitting function that are not correlated with the results can be compressed to 0 because the Weight Scale Shifting (WSS) in standard deep learning models may lead to a less pronounced effect of l 2 regularization [32]. Dropout [33][34][35][36] allows specific neurons to stop working during forward propagation of the model training phase with a certain probability.…”
Section: Regularization Techniquesmentioning
confidence: 99%
See 1 more Smart Citation
“…The l 2 regularization [30] is the most common method, which essentially incorporates the l 2 regularization term into the function structure while deriving the loss function. The l 1 regularization [31] is a type of Laplace prior distribution regularization, which adds a l 1 regularization term to the loss function of the fitting function so that some of the coefficients of the independent variables of the fitting function that are not correlated with the results can be compressed to 0 because the Weight Scale Shifting (WSS) in standard deep learning models may lead to a less pronounced effect of l 2 regularization [32]. Dropout [33][34][35][36] allows specific neurons to stop working during forward propagation of the model training phase with a certain probability.…”
Section: Regularization Techniquesmentioning
confidence: 99%
“…Besides, there are also standard regularization methods such as parameter sharing [37], max-norm regularization [38], gradient clipping [39], WEISSI [32], Etc. In this paper, we utilize Stochastic Shared Embedding Regularization (SSE) [15], a data-driven regularization method.…”
Section: Regularization Techniquesmentioning
confidence: 99%
“…Other recent works propose related forms of regularization, and argue that these are sometimes better than weight decay: In [24] introduced the "path regularizer", a generalization of the regularizer in [26] for deep neural networks and showed how it can lead to solutions that generalize better and are more robust [4,15]. Similarly, [21] utilize the homogeneity of ReLU neural network and proposed "scale shift invariant" algorithm. Proximal gradient type of algorithm has been proposed for 1-path-norm in [18], where they focus on the w 1 v 1 norm of a homogeneous unit (w, v) in shallow networks.…”
Section: Related Workmentioning
confidence: 99%
“…The weight scale shifting issue is also discussed in adversary [22], that is, scale of weights can be shifted between layers without changing the input-output function specified by the network, which could affect the capacity to regularize models. Then one weight scale shift invariant regularization is proposed and improves adversarial robustness.…”
Section: Linearity Exploration In Adversarymentioning
confidence: 99%