2016
DOI: 10.1007/978-3-319-46493-0_38
|View full text |Cite
|
Sign up to set email alerts
|

Identity Mappings in Deep Residual Networks

Abstract: Deep residual networks [1] have emerged as a family of extremely deep architectures showing compelling accuracy and nice convergence behaviors. In this paper, we analyze the propagation formulations behind the residual building blocks, which suggest that the forward and backward signals can be directly propagated from one block to any other block, when using identity mappings as the skip connections and after-addition activation. A series of ablation experiments support the importance of these identity mapping… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

22
5,708
4
12

Year Published

2017
2017
2022
2022

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 7,492 publications
(6,142 citation statements)
references
References 12 publications
22
5,708
4
12
Order By: Relevance
“…Conditional computation [Bengio et al, 2015] and adaptive computation [Graves, 2016] propose to adjust the amount of computational cost by using a policy to select data. Many of these static and dynamic techniques are used in standard deep architectures such as ResNet [He et al, 2016a] and Inception [Szegedy et al, 2017], usually with a loss of accuracy. Different from these static and dynamic techniques, our method explicitly formulates the testtime efficiency as an amortized constrained sequential decision problem such that the expected computational cost, in terms of FLOPs cost, can be greatly reduced with even improved accuracy by adaptively assigning training examples with various difficulty to their best classifiers.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Conditional computation [Bengio et al, 2015] and adaptive computation [Graves, 2016] propose to adjust the amount of computational cost by using a policy to select data. Many of these static and dynamic techniques are used in standard deep architectures such as ResNet [He et al, 2016a] and Inception [Szegedy et al, 2017], usually with a loss of accuracy. Different from these static and dynamic techniques, our method explicitly formulates the testtime efficiency as an amortized constrained sequential decision problem such that the expected computational cost, in terms of FLOPs cost, can be greatly reduced with even improved accuracy by adaptively assigning training examples with various difficulty to their best classifiers.…”
Section: Related Workmentioning
confidence: 99%
“…Deep residual network [He et al, 2016a] has been widely used in image classification field since it was proposed. We choose ResNet as our model's baseline because we can easily build a sequence of networks from shallow to deep by adjusting the number of units in each block.…”
Section: Cascaded Classifiers Using Resnetmentioning
confidence: 99%
See 1 more Smart Citation
“…One of such architectures is a novel ultra-deep residual learning network (ResNet) [4]. This architecture can be implemented by adding so called 'shortcut connections' [5] which skip one or more layers. They perform a mapping so that their outputs are added to the outputs of the stacked layes.…”
Section: Convolutional Neural Networkmentioning
confidence: 99%
“…For one, spelling errors are quite prevalent in learners' written production (Kochmar, 2011). Additionally, spelling errors have been shown to be influenced by phonological L1 transfer (Grigonytė and Hammarberg, 2014 (He et al, 2015(He et al, , 2016. Such skip connections facilitate error propagation to earlier layers in the network, which allows for building deeper networks.…”
Section: Spelling Featuresmentioning
confidence: 99%