2019
DOI: 10.1007/s11263-019-01269-y
|View full text |Cite
|
Sign up to set email alerts
|

SSN: Learning Sparse Switchable Normalization via SparsestMax

Abstract: Normalization methods improve both optimization and generalization of ConvNets. To further boost performance, the recently-proposed switchable normalization (SN) provides a new perspective for deep learning: it learns to select different normalizers for different convolution layers of a ConvNet. However, SN uses softmax function to learn importance ratios to combine normalizers, leading to redundant computations compared to a single normalizer.This work addresses this issue by presenting Sparse Switchable Norm… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
3
3

Relationship

1
5

Authors

Journals

citations
Cited by 6 publications
(6 citation statements)
references
References 42 publications
0
6
0
Order By: Relevance
“…Since its output sums to one, this invariably means less weight is assigned to the relevant items, potentially harming performance and interpretability (Jain and Wallace, 2019). This has motivated a line of research on learning networks with sparse mappings (Martins and Astudillo, 2016;Niculae and Blondel, 2017;Louizos et al, 2018;Shao et al, 2019). We focus on a recently-introduced flexible family of transformations, α-entmax (Blondel et al, 2019;Peters et al, 2019), defined as:…”
Section: Sparse Attentionmentioning
confidence: 99%
See 1 more Smart Citation
“…Since its output sums to one, this invariably means less weight is assigned to the relevant items, potentially harming performance and interpretability (Jain and Wallace, 2019). This has motivated a line of research on learning networks with sparse mappings (Martins and Astudillo, 2016;Niculae and Blondel, 2017;Louizos et al, 2018;Shao et al, 2019). We focus on a recently-introduced flexible family of transformations, α-entmax (Blondel et al, 2019;Peters et al, 2019), defined as:…”
Section: Sparse Attentionmentioning
confidence: 99%
“…Sparse attention. Prior work has developed sparse attention mechanisms, including applications to NMT (Martins and Astudillo, 2016;Malaviya et al, 2018;Niculae and Blondel, 2017;Shao et al, 2019;Maruf et al, 2019). Peters et al (2019) introduced the entmax function this work builds upon.…”
Section: Related Workmentioning
confidence: 99%
“…(6). The optimization of binary variables has been well established in the literature [22,18,17,26], which can be also used to train DGConv. The gate params are optimized by Straight-Through Estimator similar to recent network quantization approaches, which is guaranteed to converge [5].…”
Section: Construction Of the Relationship Matrixmentioning
confidence: 99%
“…When dealing with shallow features, since IN has good robustness at this case, it is used as the primary normalization method, so the weight of IN is larger than BN and LN, and all weights are learned through backpropagation. When extracting deep features, the weight of BN is relatively large, which improves the expression ability of the proposed self-adaptive normalization after feature processing [21].…”
Section: Methodsmentioning
confidence: 99%