2020
DOI: 10.1109/access.2020.2985280
|View full text |Cite
|
Sign up to set email alerts
|

A Survey on Signal Processing Based Pathological Voice Detection Techniques

Abstract: Voice disability is a barrier to effective communication. Around 1.2% of the World's population is facing some form of voice disability. Surgical procedures namely laryngoscopy, laryngeal electromyography, and stroboscopy are used for voice disability diagnosis. Researchers and practitioners have been working to find alternatives to these surgical procedures. Voice sample based diagnosis is one of them. The major steps followed by these works are (a) to extract voice features from voice samples and (b) to disc… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
21
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 68 publications
(21 citation statements)
references
References 90 publications
0
21
0
Order By: Relevance
“…The optimal classifier increases the recall to 0.83, the specificity to 0.95, the G value to 0.88, and the F1 value to 0.86. As an ensemble learning model, RF performs better than single classifiers in pathological voice classification, which is also reflected in the latest review paper [6]. Meanwhile, the same effect is shown in two other typical ensemble learning models (GBDT, XGBoost).…”
Section: Experimental Results and Analysismentioning
confidence: 63%
See 1 more Smart Citation
“…The optimal classifier increases the recall to 0.83, the specificity to 0.95, the G value to 0.88, and the F1 value to 0.86. As an ensemble learning model, RF performs better than single classifiers in pathological voice classification, which is also reflected in the latest review paper [6]. Meanwhile, the same effect is shown in two other typical ensemble learning models (GBDT, XGBoost).…”
Section: Experimental Results and Analysismentioning
confidence: 63%
“…In biomedical engineering, different features are extracted from signals to build VPD systems that automatically detect pathological voices. Most of these studies have experimented with the Massachusetts Eye and Ear Infirmary (MEEI) database [5], which has become one of the standard databases for VPD systems [6]. Nevertheless, in the past studies on voice pathology detection, many researchers ignored the class-imbalanced distribution of voice samples in the MEEI database.…”
Section: Introductionmentioning
confidence: 99%
“…For each frame, the mel-spectrogram was calculated using 64 mel-frequencybands, an FFT window length of 1024, a hop length of 64, an upper frequency bound of 16384 Hz and the HTK-formula (23) for conversion from Hertz to mel. The advantage of mel-spectrograms is that the center frequency and bandwidth of the chosen triangular filters roughly match the auditory critical band filters (24). Using the Python package librosa (25), each 500 ms frame resulted in a mel-spectrogram with 64 frequency points and 345 time frames.…”
Section: Methodsmentioning
confidence: 99%
“…The advantage of mel-spectrograms is that the center frequency and bandwidth of the chosen triangular filters roughly match the auditory critical band filters. [28] Using the Python package librosa, [29] each 500 ms frame resulted in a mel-spectrogram with 64 frequency points and 345 time frames.…”
Section: Data Preprocessingmentioning
confidence: 99%