2020
DOI: 10.1109/access.2020.3031685
|View full text |Cite
|
Sign up to set email alerts
|

A Comparative Analysis of Residual Block Alternatives for End-to-End Audio Classification

Abstract: Residual learning is known for being a learning framework that facilitates the training of very deep neural networks. Residual blocks or units are made up of a set of stacked layers, where the inputs are added back to their outputs with the aim of creating identity mappings. In practice, such identity mappings are accomplished by means of the so-called skip or shortcut connections. However, multiple implementation alternatives arise with respect to where such skip connections are applied within the set of stac… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 8 publications
(3 citation statements)
references
References 25 publications
0
3
0
Order By: Relevance
“…Network component ResUnit indicates residual connection after two CBM operations. e residual connection solves the problem of gradient disappearance [20]. Its principle is to take the output of the previous layer as the input of the latter layer, so as to create a shortcut to let information directly enter the deep network, which effectively avoids the problems of gradient disappearance and gradient explosion.…”
Section: E Establishment Ofmentioning
confidence: 99%
“…Network component ResUnit indicates residual connection after two CBM operations. e residual connection solves the problem of gradient disappearance [20]. Its principle is to take the output of the previous layer as the input of the latter layer, so as to create a shortcut to let information directly enter the deep network, which effectively avoids the problems of gradient disappearance and gradient explosion.…”
Section: E Establishment Ofmentioning
confidence: 99%
“…There are different published approaches for AAC. Regarding input audio encoding, some approaches use recurrent neural networks (RNNs) [2], [3], [4], others 2D convolutional neural networks (CNNs) [5], [6], and some others the Transformer model [7], [8]. Though, RNNs are known to have difficulties on learning temporal information [9], 2D CNNs model time-frequency but not temporal patterns [10], and the Transformer was not originally designed for sequences of thousands time-steps [7].…”
Section: Introductionmentioning
confidence: 99%
“…Though, RNNs are known to have difficulties on learning temporal information [9], 2D CNNs model time-frequency but not temporal patterns [10], and the Transformer was not originally designed for sequences of thousands time-steps [7]. For generating the captions, the Transformer decoder [6], [11], [8] or RNNs [1], [3], [5] are mostly employed, and the alignment of input audio and output captions is typically implemented with an attention mechanism [12], [11]. Also, some approaches adopt a multitask approach, where the AAC method is regularized by the prediction of keywords, based on the input audio [6], [11], [13].…”
Section: Introductionmentioning
confidence: 99%