Dysarthric speech database for universal access research

Kim, Heejin; Hasegawa‐Johnson, Mark; Perlman, Adrienne L.; Gunderson, Jon; Huang, Thomas S.; Watkin, Kenneth L.; Frame, Simone R.

doi:10.21437/interspeech.2008-480

Cited by 216 publications

(59 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…The UA-Speech [13] corpus was used for all the experiments conducted in this study. The corpus consists of 16 dysarthric speakers whose data can be used, each speaking 455 unique words, with three repetitions (aka blocks) of each word except the uncommon words.…”

Section: Dysarthric Participantsmentioning

confidence: 99%

“…Next, we report on our extensive investigation in which we conducted multiple sets of experiments to propose a deep learning-based dysarthric intelligibility assessment optimal setup that recommends feature extraction approaches and other parameters future researchers and practitioners need to consider in order to design such a system. Then, we employed different evaluation strategies to thoroughly verify how the optimal setup performs with the 16 dysarthric subjects in the UA-Speech corpus [13] and across different intelligibility classes. Finally, we conducted further experiments to perform a comparative study benchmarking the performance of our proposed optimal setup against the state of the art by adopting similar strategies previous studies employed to verify their models.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

An Investigation to Identify Optimal Setup for Automated Assessment of Dysarthric Intelligibility using Deep Learning Technologies

2022

View full text Add to dashboard Cite

Recent advances in deep learning have provided an opportunity to improve and automate dysarthria intelligibility assessment, offering a cost-effective, accessible, and less subjective way to assess dysarthric speakers. However, reviewing previous literature in the area determines that the generalization of results on new dysarthric patients was not measured properly or incomplete among the previous studies that yielded very high accuracies due to the gaps in the adopted evaluation methodologies. This is of particular importance as any practical and clinical application of intelligibility assessment approaches must reliably generalize on new patients; otherwise, the clinicians cannot accept the assessment results provided by the system deploying the approach. In this paper, after these gaps are explained, we report on our extensive investigation to propose a deep learning–based dysarthric intelligibility assessment optimal setup. Then, we explain different evaluation strategies that were applied to thoroughly verify how the optimal setup performs with new speakers and across different classes of speech intelligibility. Finally, a comparative study was conducted, benchmarking the performance of our proposed optimal setup against the state of the art by adopting similar strategies previous studies employed. Results indicate an average of 78.2% classification accuracy for unforeseen low intelligibility speakers, 40.6% for moderate intelligibility speakers, and 40.4% for high intelligibility speakers. Furthermore, we noticed a high variance of classification accuracies among individual speakers. Finally, our proposed optimal setup delivered an average of 97.19% classification accuracy when adopting a similar evaluation strategy used by the previous studies.

show abstract

Section: Dysarthric Participantsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

An Investigation to Identify Optimal Setup for Automated Assessment of Dysarthric Intelligibility using Deep Learning Technologies

2022

View full text Add to dashboard Cite

show abstract

“…Other disorders and pathologies that also affect spoken language and communication skills, such as dysarthria, aphasia, autism, Parkinson's disease, dementia, Alzheimer's disease, etc, have also been investigated and spoken corpora have been collected. The reference corpora for dysarthric speech in American English include Nemours (Menendez-Pidal et al, 1996), Universal Access (Kim et al, 2008) and TORGO (Rudzicz et al, 2012). There are also corpora in other languages, such as Korean (Kim et al, 2016) and French (Fougeron et al, 2010;Meunier et al, 2016).…”

Section: State Of the Artmentioning

confidence: 99%

PRAUTOCAL corpus: a corpus for the study of Down syndrome prosodic aspects

Mancebo

Corrales-Astorgano

Aguilar

et al. 2021

Lang Resources & Evaluation

View full text Add to dashboard Cite

Oral productions of speakers with Down syndrome exhibit special characteristics that have been the target of study for decades. In spite of this attention, the availability of rich resources for its analysis is still scarce. In this paper, we present the definition and compiling procedure of a corpus of semicontrolled oral productions of speakers with Down syndrome that aims to allow the analysis of how these speakers with these speakers produce functional and linguistic aspects of speech. The PRAUTOCAL corpus has been recorded while using a video game for training oral competences. Utterances are related to well defined communicative tasks recorded by both speakers with Down syndrome and typically developing speakers. We present the procedure for human experts to evaluate the recordings and the transcription criteria followed for enriching the utterances of the corpus. PRAUTOCAL permits the analysis of the clear contrast in voice and speech between individuals with Down syndrome and typically developing speakers, taking into account the high heterogeneity of the speech problems characteristic of the syndrome. This material allows the analysis of the speech problems in Down syndrome, with applications to the generation of knowledge that could be used in future works for therapists to prepare specific training or enriching diagnosis regarding possible speech and language disorders.

show abstract

“…In this study we use the UASpeech corpus [8], which contains isolated-word recordings of 15 speakers with dysarthria. These recordings consist of 449 words which are divided into 3 blocks of equal length (B1, B2 and B3).…”

Section: Description Of the Dataset And Preprocessingmentioning

confidence: 99%

“…However, synthesising pathological speech via VC is not without challenges. Existing pathological speech corpora [8,9,5,10] provide healthy control speakers, but healthy speech recordings from the same pathological speaker are rarely available. This means that a successful pathological voice conversion system needs to learn conversion of both, the voice and pathological characteristics simultaneously, as suggested in previous work [4].…”

Section: Introductionmentioning

confidence: 99%

Pathological voice adaptation with autoencoder-based voice conversion

Illa¹,

Halpern²,

Son³

et al. 2021

11th ISCA Speech Synthesis Workshop (SSW 11)

View full text Add to dashboard Cite

In this paper, we propose a new approach to pathological speech synthesis. Instead of using healthy speech as a source, we customise an existing pathological speech sample to a new speaker's voice characteristics. This approach alleviates the evaluation problem one normally has when converting typical speech to pathological speech, as in our approach, the voice conversion (VC) model does not need to be optimised for speech degradation but only for the speaker change. This change in the optimisation ensures that any degradation found in naturalness is due to the conversion process and not due to the model exaggerating characteristics of a speech pathology. To show a proof of concept of this method, we convert dysarthric speech using the UASpeech database and an autoencoder-based VC technique. Subjective evaluation results show reasonable naturalness for high intelligibility dysarthric speakers, though lower intelligibility seems to introduce a marginal degradation in naturalness scores for mid and low intelligibility speakers compared to ground truth. Conversion of speaker characteristics for low and high intelligibility speakers is successful, but not for mid. Whether the differences in the results for the different intelligibility levels is due to the intelligibility levels or due to the speakers needs to be further investigated.

show abstract

Dysarthric speech database for universal access research

Cited by 216 publications

References 9 publications

An Investigation to Identify Optimal Setup for Automated Assessment of Dysarthric Intelligibility using Deep Learning Technologies

An Investigation to Identify Optimal Setup for Automated Assessment of Dysarthric Intelligibility using Deep Learning Technologies

PRAUTOCAL corpus: a corpus for the study of Down syndrome prosodic aspects

Pathological voice adaptation with autoencoder-based voice conversion

Contact Info

Product

Resources

About