2018
DOI: 10.1016/j.knosys.2018.07.033
|View full text |Cite
|
Sign up to set email alerts
|

Audio classification using attention-augmented convolutional neural network

Abstract: 2018) Audio classification using attention-augmented convolutional neural network. Published by: Elsevier URL: http://dx. A B S T R A C TAudio classification, as a set of important and challenging tasks, groups speech signals according to speakers' identities, accents, and emotional states. Due to the high dimensionality of the audio data, task-specific handcrafted features extraction is always required and regarded cumbersome for various audio classification tasks. More importantly, the inherent relationship… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
33
0
1

Year Published

2019
2019
2021
2021

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 42 publications
(34 citation statements)
references
References 29 publications
0
33
0
1
Order By: Relevance
“…Moreover, the classification accuracy of the method was not good in case of the presence of environmental noises but was effective in the case of clean environments. Wu et al [1] developed an attention-augmented CNN that enhanced the features generated from the frequency bands. The drawback of the method was regarding the higher number of the local frequency segments as this would increase the parameters in the model, affecting the performance of the system.…”
Section: Literature Surveymentioning
confidence: 99%
See 3 more Smart Citations
“…Moreover, the classification accuracy of the method was not good in case of the presence of environmental noises but was effective in the case of clean environments. Wu et al [1] developed an attention-augmented CNN that enhanced the features generated from the frequency bands. The drawback of the method was regarding the higher number of the local frequency segments as this would increase the parameters in the model, affecting the performance of the system.…”
Section: Literature Surveymentioning
confidence: 99%
“…Audio classification differentiates the audio using the emotion, identity, accents, and other parameters of the speakers, which when performed effectively, can contribute the tasks, like translation from speech-to-speech and automatic recognition of the speech [1]. For example, the emotions of the sound are employed for the translation of the utterances spoken in a language to another [2] that facilitates the ability to handle the non-linguistic data and ensures the practical way of translation.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations
“…In addition, batch normalization [17] is attached to the convolutional layer for accelerating the learning. Secondly, since the event occurring timestamp is important to distinguish between events, we also consider local attention which aims to capture the onset and offset of the events by taking the local patch features as input [18,19]. Specifically, along the frequency axis with the interval set to 1, the global spatial descriptor S is split into local parts, denoted as local descriptors s 1 , s 2 , .…”
Section: Spatial Attentionmentioning
confidence: 99%