Environment Sound Classification Based on Visual Multi-Feature Fusion and GRU-AWS

Peng, Ning Song; Chen, Aibin; Zhou, Guoxiong; Chen, Wenjie; Liu, Jing; Ding, Fubo

doi:10.1109/access.2020.3032226

Cited by 21 publications

(13 citation statements)

References 42 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The performance of the model has seven evaluators: accuracy (14), sensitivity (15), specificity (16), precision (17), the f1score ( 18), cohen's kappa (19), and the matthews correlation coefficient (MCC) (20). The model was assessed using the evaluation index.…”

Section: ) Model Evaluationmentioning

confidence: 99%

Lightweight Skip Connections With Efficient Feature Stacking for Respiratory Sound Classification

Choi

Lee

et al. 2022

IEEE Access

View full text Add to dashboard Cite

As the number of deaths from respiratory diseases due to COVID-19 and infectious diseases increases, early diagnosis is necessary. In general, the diagnosis of diseases is based on imaging devices (e.g., computed tomography and magnetic resonance imaging) as well as the patient's underlying disease information. However, these examinations are time-consuming, incur considerable costs, and in a situation like the ongoing pandemic, face-to-face examinations are difficult to conduct. Therefore, we propose a lung disease classification model based on deep learning using non-contact auscultation. In this study, two respiratory specialists collected normal respiratory sounds and five types of abnormal sounds associated with lung disease, including those associated with four lung lesions in the left and right anterior chest and left and right posterior chest. For preprocessing and feature extraction, the noise was removed using three pass filters (low, band, and high), and respiratory sound features were extracted using the Log-Mel Spectrogram-Mel Frequency Cepstral Coefficient followed by feature stacking. Then, we propose a lung disease classification model of dense lightweight convolutional neural network-bidirectional gated recurrent unit skip connections using depthwise separable convolution based on the extracted respiratory sound information. The performance of the classification model was compared with both the baseline and the lightweight models. The results indicate that the proposed model achieves high performance and has an accuracy of 92.3%, sensitivity of 92.1%, specificity of 98.5%, and f1-score of 91.9%. Using the proposed model, we aim to contribute to the early detection of diseases during the COVID-19 pandemic.

show abstract

Section: ) Model Evaluationmentioning

confidence: 99%

Lightweight Skip Connections With Efficient Feature Stacking for Respiratory Sound Classification

Choi

Lee

et al. 2022

IEEE Access

View full text Add to dashboard Cite

show abstract

“…Feature extraction from MFCCs was performed using pre-emphasis, windowing, fast Fourier transform, Mel filtering, nonlinear transformation, and discrete cosine transform [15]. The first feature consisted of 40-dimension MFCCs [16,17]. Next, for the second and third features, we calculated the MFCC trajectories over time (delta MFCCs) and the second-order delta of MFCCs.…”

Section: Comparison and Evaluationmentioning

confidence: 99%

Deep Learning Application for Vocal Fold Disease Prediction Through Voice Recognition: Preliminary Development Study

Hu¹,

Chang²,

Wang³

et al. 2021

J Med Internet Res

View full text Add to dashboard Cite

Background Dysphonia influences the quality of life by interfering with communication. However, a laryngoscopic examination is expensive and not readily accessible in primary care units. Experienced laryngologists are required to achieve an accurate diagnosis. Objective This study sought to detect various vocal fold diseases through pathological voice recognition using artificial intelligence. Methods We collected 189 normal voice samples and 552 samples of individuals with voice disorders, including vocal atrophy (n=224), unilateral vocal paralysis (n=50), organic vocal fold lesions (n=248), and adductor spasmodic dysphonia (n=30). The 741 samples were divided into 2 sets: 593 samples as the training set and 148 samples as the testing set. A convolutional neural network approach was applied to train the model, and findings were compared with those of human specialists. Results The convolutional neural network model achieved a sensitivity of 0.66, a specificity of 0.91, and an overall accuracy of 66.9% for distinguishing normal voice, vocal atrophy, unilateral vocal paralysis, organic vocal fold lesions, and adductor spasmodic dysphonia. Compared with the accuracy of human specialists, the overall accuracy rates were 60.1% and 56.1% for the 2 laryngologists and 51.4% and 43.2% for the 2 general ear, nose, and throat doctors. Conclusions Voice alone could be used for common vocal fold disease recognition through a deep learning approach after training with our Mandarin pathological voice database. This approach involving artificial intelligence could be clinically useful for screening general vocal fold disease using the voice. The approach includes a quick survey and a general health examination. It can be applied during telemedicine in areas with primary care units lacking laryngoscopic abilities. It could support physicians when prescreening cases by allowing for invasive examinations to be performed only for cases involving problems with automatic recognition or listening and for professional analyses of other clinical examination results that reveal doubts about the presence of pathologies.

show abstract

“…Our feature engineering process was derived from reference [ 31 ]. Fusing of multi-spectrogram features as one new feature has been proposed to improve sound recognition accuracy [ 31 ]. A total of three features were extracted.…”

Section: Methodsmentioning

confidence: 99%

Efficiently Classifying Lung Sounds through Depthwise Separable CNN Models with Fused STFT and MFCC Features

Jung

Liao

et al. 2021

Diagnostics

View full text Add to dashboard Cite

Lung sounds remain vital in clinical diagnosis as they reveal associations with pulmonary pathologies. With COVID-19 spreading across the world, it has become more pressing for medical professionals to better leverage artificial intelligence for faster and more accurate lung auscultation. This research aims to propose a feature engineering process that extracts the dedicated features for the depthwise separable convolution neural network (DS-CNN) to classify lung sounds accurately and efficiently. We extracted a total of three features for the shrunk DS-CNN model: the short-time Fourier-transformed (STFT) feature, the Mel-frequency cepstrum coefficient (MFCC) feature, and the fused features of these two. We observed that while DS-CNN models trained on either the STFT or the MFCC feature achieved an accuracy of 82.27% and 73.02%, respectively, fusing both features led to a higher accuracy of 85.74%. In addition, our method achieved 16 times higher inference speed on an edge device and only 0.45% less accuracy than RespireNet. This finding indicates that the fusion of the STFT and MFCC features and DS-CNN would be a model design for lightweight edge devices to achieve accurate AI-aided detection of lung diseases.

show abstract

Environment Sound Classification Based on Visual Multi-Feature Fusion and GRU-AWS

Cited by 21 publications

References 42 publications

Lightweight Skip Connections With Efficient Feature Stacking for Respiratory Sound Classification

Lightweight Skip Connections With Efficient Feature Stacking for Respiratory Sound Classification

Deep Learning Application for Vocal Fold Disease Prediction Through Voice Recognition: Preliminary Development Study

Efficiently Classifying Lung Sounds through Depthwise Separable CNN Models with Fused STFT and MFCC Features

Contact Info

Product

Resources

About