2022
DOI: 10.1016/j.patrec.2022.07.012
|View full text |Cite
|
Sign up to set email alerts
|

ERANNs: Efficient residual audio neural networks for audio pattern recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
13
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 39 publications
(17 citation statements)
references
References 7 publications
2
13
0
Order By: Relevance
“…Efficient CNNs for Audio Tagging: Searching for efficient CNN architectures for AT and scaling models has been investigated before [1,3,2]. Efficient vision architectures, such as EfficientNets [8,9] and MobileNets [5,6,7] have shown to provide a good performance-complexity trade-off also in the audio domain [1,2,10].…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Efficient CNNs for Audio Tagging: Searching for efficient CNN architectures for AT and scaling models has been investigated before [1,3,2]. Efficient vision architectures, such as EfficientNets [8,9] and MobileNets [5,6,7] have shown to provide a good performance-complexity trade-off also in the audio domain [1,2,10].…”
Section: Related Workmentioning
confidence: 99%
“…Audio Tagging (AT) is the task of assigning one or multiple semantic labels to an audio clip. Until recently, CNNs have dominated the field of AT [1,2,3]. CNNs are a wellstudied and understood architecture for processing spectrograms and include well-suited inductive biases such as the locality bias, weight sharing and translation equivariance.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Popular method to solve the audio classification task is usage of convolutional neural networks (CNN). In the way of representing the signal in the form of MFCCs it is possible to use 2D convolutional layers [Verbitskiy et al (2022)], this technique allows to train the model on the patterns of speech in terms of time, frequency and dynamic.…”
Section: Audio Emotionsmentioning
confidence: 99%
“…A milestone for audio pattern recognition was the release of AudioSet [8], a dataset containing over 5000 h of audio recordings with 527 sound classes. Several CNN based models have been proposed for large scale audio classifi-cation [9][10][11][12][13]; however, pretrained audio neural network or PANN [14] is a VGGish [15] CNN-based model that achieves the state-of-the-art result for Audioset classification task. In the next section, we present the approach to detect the missing subtitle speech blocks.…”
Section: Audio Category Classificationmentioning
confidence: 99%