2017
DOI: 10.48550/arxiv.1709.01507
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Squeeze-and-Excitation Networks

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

3
620
1
4

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 453 publications
(628 citation statements)
references
References 0 publications
3
620
1
4
Order By: Relevance
“…Note, some papers have also focused on ResNet-50 training [2,27,49], but they have either modified the architecture or changed the resolution, which does not allow for a direct comparison to the original ResNet-50 at resolution 224×224. For instance, Lee et al [27] use ResNet-D [14] with SE attention [20]. Bello et al [2] also optimize ResNet without architectural changes, but they don't report competitive results for ResNet-50 at 224×224.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Note, some papers have also focused on ResNet-50 training [2,27,49], but they have either modified the architecture or changed the resolution, which does not allow for a direct comparison to the original ResNet-50 at resolution 224×224. For instance, Lee et al [27] use ResNet-D [14] with SE attention [20]. Bello et al [2] also optimize ResNet without architectural changes, but they don't report competitive results for ResNet-50 at 224×224.…”
Section: Related Workmentioning
confidence: 99%
“…We solely consider the training recipe. Therefore we exclude all variations of the ResNet-50 such as SE-ResNet-50 [20] or ResNet-50-D [14], which usually improve the accuracy under the same training procedure. In summary, in this paper,…”
Section: Introductionmentioning
confidence: 99%
“…Besides, as can be indicated from the bowl shape of curves, a medium number of features in each channel performs best when F × C is fixed. The reason is that the number of channels is too small and there is not enough degrees of freedom for dynamic adjustment of attention maps when F is too large, while each channel does not contain enough features to effectively capture the global information [40] when F is too small. So, we choose to reshape the feature vector into 192 channels with 16 features in each channel.…”
Section: A Impacts Of Network Parametersmentioning
confidence: 99%
“…• As is illustrated in the third subfigure, all scale factors in the fourth attention map are 0.5, which is due to the zero output of the former ReLU activation function and the Sigmoid activation function used to predict the attention map. Therefore, the last attention module is actually useless and can be removed during testing to further reduce the complexity [40].…”
Section: The Role Of Attentionmentioning
confidence: 99%
See 1 more Smart Citation