Attention based convolutional recurrent neural network for environmental sound classification

Zhang, Zhichao; Xu, Shugong; Zhang, Shunqing; Qiao, Tianhao; Cao, Shan

doi:10.1016/j.neucom.2020.08.069

Cited by 84 publications

(55 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…In the work of Zhang et al [16], a convolutional RNN architecture with an attention mechanism, namely ACRNN, was proposed. Attention for both CNN and RNN layers was investigated.…”

Section: Attention Mechanismsmentioning

confidence: 99%

“…As shown in Tables I, the two model architectures of A-BiLSTM and A-BiGRU have the same structures, with the only differing aspect being the type of the RNN network, i.e., LSTM or GRU, implemented in layers 1 and 3. The choice of implementing the attention mechanism in the second layer of each model was influenced by the suggestion in [16], which demonstrated that the attention mechanism was best suited to increase the model accuracy by being on layer 2 or 10 of the network. With the choice of a dense layer being the final connected layer of the two architectures, therefore the only remaining option was to implement the mechanism on layer 2.…”

Section: B the Proposed Model Architecturesmentioning

confidence: 99%

See 1 more Smart Citation

Deep Recurrent Neural Networks with Attention Mechanisms for Respiratory Anomaly Classification

Wall

Zhang

et al. 2021

2021 International Joint Conference on Neural Networks (IJCNN)

View full text Add to dashboard Cite

In recent years, a variety of deep learning techniques and methods have been adopted to provide AI solutions to issues within the medical field, with one specific area being audio-based classification of medical datasets. This research aims to create a novel deep learning architecture for this purpose, with a variety of different layer structures implemented for undertaking audio classification. Specifically, bidirectional Long Short-Term Memory (BiLSTM) and Gated Recurrent Units (GRU) networks in conjunction with an attention mechanism, are implemented in this research for chronic and non-chronic lung disease and COVID-19 diagnosis. We employ two audio datasets, i.e. the Respiratory Sound and the Coswara datasets, to evaluate the proposed model architectures pertaining to lung disease classification. The Respiratory Sound Database contains audio data with respect to lung conditions such as Chronic Obstructive Pulmonary Disease (COPD) and asthma, while the Coswara dataset contains coughing audio samples associated with COVID-19. After a comprehensive evaluation and experimentation process, as the most performant architecture, the proposed attention BiLSTM network (A-BiLSTM) achieves accuracy rates of 96.2% and 96.8% for the Respiratory Sound and the Coswara datasets, respectively. Our research indicates that the implementation of the BiLSTM and attention mechanism was effective in improving performance for undertaking audio classification with respect to various lung condition diagnoses.

show abstract

“…In the work of Zhang et al [16], a convolutional RNN architecture with an attention mechanism, namely ACRNN, was proposed. Attention for both CNN and RNN layers was investigated.…”

Section: Attention Mechanismsmentioning

confidence: 99%

Section: B the Proposed Model Architecturesmentioning

confidence: 99%

Deep Recurrent Neural Networks with Attention Mechanisms for Respiratory Anomaly Classification

Wall

Zhang

et al. 2021

2021 International Joint Conference on Neural Networks (IJCNN)

View full text Add to dashboard Cite

show abstract

“…Traditional machine learning algorithms for audio pattern recognition include K-nearest neighbors, support vector machines, and Gaussian mixture models, etc. But with the support of more labelled datasets, neural networks based methods including convolutional neural networks [4,5], recurrent neural networks [6,7], and their combination [8,9], have achieved superior performance over the traditional approaches.…”

Section: Introductionmentioning

confidence: 99%

“…Li et al [15] proposed a multi-stream network with temporal attention in which the structure is composed of three streams, each containing a single temporal attention vector. Zhang et al [9] integrated temporal attention into its CRNN architecture and the same authors [16] proposed a model that combines channel attention and temporal attention together.…”

Section: Introductionmentioning

confidence: 99%

A Multi-Channel Temporal Attention Convolutional Neural Network Model for Environmental Sound Classification

Wang

Feng

Anderson

2021

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Recently, many attention-based deep neural networks have emerged and achieved state-of-the-art performance in environmental sound classification. The essence of attention mechanism is assigning contribution weights on different parts of features, namely channels, spectral or spatial contents, and temporal frames. In this paper, we propose an effective convolutional neural network structure with a multichannel temporal attention (MCTA) block, which applies a temporal attention mechanism within each channel of the embedded features to extract channel-wise relevant temporal information. This multi-channel temporal attention structure will result in a distinct attention vector for each channel, which enables the network to fully exploit the relevant temporal information in different channels. The datasets used to test our model include ESC-50 and its subset ESC-10, along with development sets of DCASE 2018 and 2019. In our experiments, MCTA performed better than the single-channel temporal attention model and the non-attention model with the same number of parameters. Furthermore, we compared our model with some successful attention-based models and obtained competitive results with a relatively lighter network.

show abstract

“…Recent studies have shown that recurrent neural networks (RNN) produce excellent results for variable-length sound sequences. Zhang et al [23] proposed a CNN architecture to learn spectro-temporal features and a bidirectional gated recurrent unit (Bi-GRU) with a frame-level attention mechanism for sound classification. Wang et al [24] proposed a CNN architecture with a parallel temporal-spectral attention mechanism to capture certain frames where sound events occur and pay attention to varying frequency bands.…”

Section: Introductionmentioning

confidence: 99%

Detecting Drill Failure in the Small Short-sound Drill Dataset

Tran¹,

Pham²,

Lundgren³

2021

Preprint

View full text Add to dashboard Cite

Monitoring the conditions of machines is vital in the manufacturing industry. Early detection of faulty components in machines for stopping and repairing the failed components can minimize the downtime of the machine. This article presents an approach to detect the failure occurring in drill machines based on drill sounds from Valmet AB. The drill dataset includes three classes: anomalous sounds, normal sounds, and irrelevant sounds, which are also labeled as "Broken", "Normal", and "Other", respectively. Detecting drill failure effectively remains a challenge due to the following reasons. The waveform of drill sound is complex and short for detection. Additionally, in realistic soundscapes, there are sounds and noise in the context at the same time. Moreover, the balanced dataset is small to apply state-of-the-art deep learning techniques. To overcome these aforementioned difficulties, we augmented sounds to increase the number of sounds in the dataset. We then proposed a convolutional neural network (CNN) combined with a long short-term memory (LSTM) to extract features from log-Mel spectrograms and learn global highlevel feature representation for the classification of three classes. A leaky rectified linear unit (Leaky ReLU) was utilized as the activation function for our proposed CNN instead of the rectified linear unit (ReLU). Moreover, we deployed an attention mechanism at the frame level after the LSTM layer to learn long-term global feature representations. As a result, the proposed method reached an overall accuracy of 92.35% for the drill failure detection system.

show abstract

Attention based convolutional recurrent neural network for environmental sound classification

Cited by 84 publications

References 16 publications

Deep Recurrent Neural Networks with Attention Mechanisms for Respiratory Anomaly Classification

Deep Recurrent Neural Networks with Attention Mechanisms for Respiratory Anomaly Classification

A Multi-Channel Temporal Attention Convolutional Neural Network Model for Environmental Sound Classification

Detecting Drill Failure in the Small Short-sound Drill Dataset

Contact Info

Product

Resources

About