2023
DOI: 10.1109/access.2023.3243986
|View full text |Cite
|
Sign up to set email alerts
|

Automatic Voice Disorder Detection Using Self-Supervised Representations

Abstract: Many speech features and models, including Deep Neural Networks (DNN), are used for classification tasks between healthy and pathological speech with the Saarbruecken Voice Database (SVD). However, accuracy values of 80.71% for phrases or 82.8% for vowels /aiu/ are the highest reported for audio samples in SVD when the evaluation includes the wide amount of pathologies in the database, instead of a selection of some pathologies. This paper targets this top performance in the state-of-the-art Automatic Voice Di… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2024
2024
2025
2025

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 17 publications
(8 citation statements)
references
References 37 publications
0
8
0
Order By: Relevance
“…The deep learning models commonly used in these various studies are summarized in Table 6. Ribas et al [1] proposed a very promising approach of using a self supervised(SS) representations for classifying healthy and pathological voices. SS representations were basically learned from unlabeled speech data, and then used as an input to a classification model that is trained on labeled speech data.The authors explored two modalities of SS representations: frozen and fine tuned.…”
Section: Feature Extraction and Deep Learning Methodsmentioning
confidence: 99%
See 3 more Smart Citations
“…The deep learning models commonly used in these various studies are summarized in Table 6. Ribas et al [1] proposed a very promising approach of using a self supervised(SS) representations for classifying healthy and pathological voices. SS representations were basically learned from unlabeled speech data, and then used as an input to a classification model that is trained on labeled speech data.The authors explored two modalities of SS representations: frozen and fine tuned.…”
Section: Feature Extraction and Deep Learning Methodsmentioning
confidence: 99%
“…Overall, eight disorders were detected in these research articles including four speech and four voice disorders as shown in table 4. Most of the studies did not focus on one specific disorder, however they covered various disorders present within the category of either speech or sound disorder for instance: [1] covered 4 speech disorders, [9] covered 3 speech disorders, [11] covered 4 voice disorders, [15] worked on 3 voice disorders and [16] focuses on 6 voice disorders. Few studies including [4] and [8] were too general and their focus was to classify just pathological voice and healthy voice.…”
Section: Type Of Speech Disordermentioning
confidence: 99%
See 2 more Smart Citations
“…The model learns representations from the raw speech, and these representations can be used in the required downstream task. Examples of pre-trained models are wav2vec2 and HuBERT that have shown good performance in various speech technology tasks, such as ASR, emotion recognition, speaker and language identification, and voice disorder detection [47,48,49,50,51,52,53]. There are, however, no studies on using recent self-supervised pre-trained models, such as wav2vec2 [47] and HuBERT [54], for voice quality classification.…”
Section: Introductionmentioning
confidence: 99%