2019
DOI: 10.1109/access.2019.2939495
|View full text |Cite
|
Sign up to set email alerts
|

Learning Attentive Representations for Environmental Sound Classification

Abstract: Environmental sound classification (ESC) is a challenging problem due to the complexity of sounds. The classification performance is heavily dependent on the effectiveness of representative features extracted from the environmental sounds. However, ESC often suffers from the semantically irrelevant frames and silent frames. In order to deal with this, we employ a frame-level attention model to focus on the semantically relevant frames and salient frames. Specifically, we first propose a convolutional recurrent… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
47
0
1

Year Published

2020
2020
2024
2024

Publication Types

Select...
7
2

Relationship

0
9

Authors

Journals

citations
Cited by 61 publications
(48 citation statements)
references
References 46 publications
0
47
0
1
Order By: Relevance
“…The advantage of this method is that the process to manually extract features is cancelled. However, 1D-CNN extracts features at the global level without considering the temporal structure and frequency feature of environmental sounds [3]. In second category method, the network is trained by features extracted from raw signal, such as spectrogram and MFCC.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…The advantage of this method is that the process to manually extract features is cancelled. However, 1D-CNN extracts features at the global level without considering the temporal structure and frequency feature of environmental sounds [3]. In second category method, the network is trained by features extracted from raw signal, such as spectrogram and MFCC.…”
Section: Related Workmentioning
confidence: 99%
“…As one of the branches, the accuracy of speech classification and music classification has reached a considerable level, even exceeding the ability of human auditory perception [1], [2]. However, as another branch of speech recognition, environmental sound classification (ESC) still faces many difficulties in various aspects, such as nonstationary nature of environment sound and the strong interference of ambient noise [3]. On the other side, ESC research has an effect on the construction of smart cities [4].…”
Section: Introductionmentioning
confidence: 99%
“…Li et al [17] proposed a multi-stream network with temporal attention in which the structure is composed of three streams, each containing a single temporal attention vector. Zhang et al [9] integrated temporal attention into its CRNN architecture and the same authors [18] proposed a model that combines channel attention and temporal attention together.…”
Section: Introductionmentioning
confidence: 99%
“…To manage the obstacles, we considered the role of the attention mechanism. As a novel intelligent method, the attention mechanism, which has the capability to adaptively capture temporal correlations between different sequences [23] and allows for feature extraction networks to focus on the relevant characteristics without signal processing technology and feature engineering, are commonly explored in various structural prediction tasks, such as document classification [24], speech recognition [25][26][27], and environmental classification [28,29]. Therefore, in this paper, we propose a novel ABD method for gear fault diagnosis under different working conditions based on a multi-scale convolutional learning structure and attention mechanism.…”
Section: Introductionmentioning
confidence: 99%