2022
DOI: 10.1109/access.2022.3205591
|View full text |Cite
|
Sign up to set email alerts
|

Classification of Cough Sounds Using Spectrogram Methods and a Parallel-Stream One-Dimensional Deep Convolutional Neural Network

Abstract: Currently, a subjective method is used to diagnose cough sounds, particularly wet and dry coughs, which can lead to incorrect diagnoses. In this study, novel emergent features were extracted using spectrogram methods and a parallel-stream one-dimensional (1D) deep convolutional neural network (DCNN) to classify cough sounds. The data of this study were obtained from two datasets. We employed the Mel spectrogram, chromagram constant-Q transform, Mel-frequency cepstral coefficient, constant-Q cepstral coefficien… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
2
1

Relationship

1
5

Authors

Journals

citations
Cited by 7 publications
(6 citation statements)
references
References 51 publications
0
6
0
Order By: Relevance
“…Studies have noted the importance of Mean F0 and Mean VTL that characterise 'who is talking' or speaker identity (Lavan, Knight, et al, 2019). The current study contributed to the argument that speaker identity and long-term traits, and short-term states, such as speaker emotions, are intertwined (Belin et al, 2004 C#, D, D#, E, F, F#, G, G#, A, A#, and B from lowest to highest in the Western music scale (Huang & Mushi, 2022). Higher Chroma_cqt suggests the speech sample was closer to the higher note in the music scale.…”
Section: Discussion41 Characterising Human Vocal Confidence Through Vtlmentioning
confidence: 60%
“…Studies have noted the importance of Mean F0 and Mean VTL that characterise 'who is talking' or speaker identity (Lavan, Knight, et al, 2019). The current study contributed to the argument that speaker identity and long-term traits, and short-term states, such as speaker emotions, are intertwined (Belin et al, 2004 C#, D, D#, E, F, F#, G, G#, A, A#, and B from lowest to highest in the Western music scale (Huang & Mushi, 2022). Higher Chroma_cqt suggests the speech sample was closer to the higher note in the music scale.…”
Section: Discussion41 Characterising Human Vocal Confidence Through Vtlmentioning
confidence: 60%
“…For example, in [13], a random-padding algorithm was proposed to eliminate temporal differences between environmental sound signals in preprocessing. In [14], a zero-padding system was created for preprocessing. In this system, signals of a shorter-thannormal duration are padded with 0s.…”
Section: A Signal Preprocessingmentioning
confidence: 99%
“…The audio signals in our dataset are spectrogram and timedomain signals. The spectrogram signals were from Mel and log-Mel spectrograms, whose formulas are presented in [14] and [23], respectively. The time-domain signals, which are denoted SIG-WAVE, comprised blended vocal and noise signals.…”
Section: Feature Extraction and Deep Learning Models A Feature Extrac...mentioning
confidence: 99%
See 1 more Smart Citation
“…The model is trained on 5 different genres and recognizes them with an accuracy of 90.32%. Richard M. used convolutional neural networks to classify cough sounds using features such as mel spectrograms and melfrequency coefficients and achieved an accuracy of 82.96% [5]. Satish K. also conducted research in the field of medicine and used deep learning to monitor the pulse status of patients and achieved 85% accuracy [6].…”
Section: Introductionmentioning
confidence: 99%