Improving the intelligibility of dysarthric speech

Kain, Alexander; Hosom, John-Paul; Niu, Xiaochuan; Santen, Jan P. H. van; Fried-Oken, Melanie; Staehely, Janice

doi:10.1016/j.specom.2007.05.001

Cited by 146 publications

(92 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To develop this technique, we need a deep understanding of how to effectively factorize speech acoustics into its individual components such as linguistic, non-linguistic, and para-linguistic information using various technologies, such as speech analysis, speech synthesis, acoustic modeling, and machine learning. Moreover, VC has great potential to develop various applications not only for flexible control of speaker identity of synthetic speech in textto-speech (TTS) [1] but also as a speaking aid for vocally handicapped people such as dysarthric patients [2] and laryngectomees [3], as a voice changer to flexibly generate various types of emotional [4] and expressive speech [5], for vocal effects to produce more varieties of singing voices [6,7], for enhanced mobile speech communication using wideband speech [8] and silent speech [9], accent conversion for computer assisted language learning [10], and so on. Therefore, it is worthwhile to study this technique for both scientific purposes and industrial applications.…”

Section: Introductionmentioning

confidence: 99%

The Voice Conversion Challenge 2016

et al. 2016

View full text Add to dashboard Cite

This paper describes the Voice Conversion Challenge 2016 devised by the authors to better understand different voice conversion (VC) techniques by comparing their performance on a common dataset. The task of the challenge was speaker conversion, i.e., to transform the voice identity of a source speaker into that of a target speaker while preserving the linguistic content. Using a common dataset consisting of 162 utterances for training and 54 utterances for evaluation from each of 5 source and 5 target speakers, 17 groups working in VC around the world developed their own VC systems for every combination of the source and target speakers, i.e., 25 systems in total, and generated voice samples converted by the developed systems. These samples were evaluated in terms of target speaker similarity and naturalness by 200 listeners in a controlled environment. This paper summarizes the design of the challenge, its result, and a future plan to share views about unsolved problems and challenges faced by the current VC techniques.

show abstract

Section: Introductionmentioning

confidence: 99%

The Voice Conversion Challenge 2016

et al. 2016

View full text Add to dashboard Cite

show abstract

“…There has also been growing interest among the researchers to explore the speech characteristics of impaired speech towards the development of ASR system which can recognize impaired speech. Kain et al [30], Kain et al [8] and Rudzidc, [31] modified the speech features of dysarthria to more closely match the non-dysarthric speaker. The study reported that the intelligibility of dysarthric speech can be improved up to 20%.…”

Section: Research Backgroundmentioning

confidence: 99%

“…[30] English F0, Formant, Intensity Dysarthric speech can be modified to improve intelligibility from 68% to 87%. Kain, et al, [8] English dysarthric speakers F0, Formant, Intensity Improving the intelligibility of dysarthric vowels of one speaker from 48% to 54% Rudzidc, [31] TORGO -English dysarthric speakers…”

Section: Research Backgroundmentioning

confidence: 99%

The Effect Of Changes In Speech Features On The Recognition Accuracy Of ASR System: A Study On The Malay Speech Impaired Children

Rosdi

Mustafa

Salim

et al. 2017

MJCS

View full text Add to dashboard Cite

Speech impairments refers to disability that causes the human speech production to deviate from the norm. Although there have been several researches undertaken to identify the differences between non-impaired and impaired speech, little is known about their effects on the speech intelligibility and the performance of ASR systems in recognizing impaired speech of children. This study investigates the speech features of impaired speech in relation to intelligibility deficits and degradation in ASR performance; which includes, formant frequencies, intensity, fundamental frequency (F0) and perturbation features such as jitter and shimmer. As there is no existing speech database for performing the evaluation, we have developed a speech database of speech impaired children and have analysed the impaired speech features. We have identified significant differences in the selected features. We also have identified the relationship between the ASR system's Word Error Rate (WER) of impaired speeches with the speech features. The results show that there are significant differences in F0, jitter and shimmer across the Control Group (CG) and the Speech Impaired Group (SIG). This paper explains the differences between impaired speeches and non-impaired speeches that can be used in developing automated speech recognition system. We have observed that F0 affects the ASR performance and was found to be a significant predictor that influences the accuracy of vowel phonemes /e/ and /u/.

show abstract

“…The capability of handling the speaker characteristics within a speech signal has great potential to be employed in real-world applications. Indeed, this so-called voice conversion (VC) framework has been used in several works, such as, singing voice conversion [1,2], body-conducted speech conversion [3], speech signal recovery [4,5], and speech modification [6]. The growing interest in VC development motivated many researchers around the world to conceive the 1 st Voice Conversion Challenge in 2016 [7].…”

Section: Introductionmentioning

confidence: 99%

NU Voice Conversion System for the Voice Conversion Challenge 2018

Tobing¹,

Wu²,

Hayashi³

et al. 2018

EasyChair Preprints

View full text Add to dashboard Cite

This paper presents the NU (Nagoya University) voice conversion (VC) system for the HUB task of the Voice Conversion Challenge 2018 (VCC 2018). The design of the NU VC system can basically be separated into two modules consisting of a speech parameter conversion module and a waveformprocessing module. In the speech parameter conversion module, a deep learning framework is deployed to estimate the spectral parameters of a target speaker given those of a source speaker. Specifically, a deep neural network (DNN) and a deep mixture density network (DMDN) are used as the deep model structure. In the waveform-processing module, given the estimated spectral parameters and linearly transformed F0 parameters, the converted waveform is generated using a WaveNet-based vocoder system. To use the WaveNet-based vocoder, there are several generation flows based on an analysissynthesis framework to obtain the speech parameter set, on the basis of which a system selection process is performed to select the best one in an utterance-wise manner. The results of VCC 2018 ranked the NU VC system in second place with an overall mean opinion score (MOS) of 3.44 for speech quality and 85% accuracy for speaker similarity.

show abstract

Improving the intelligibility of dysarthric speech

Cited by 146 publications

References 22 publications

The Voice Conversion Challenge 2016

The Voice Conversion Challenge 2016

The Effect Of Changes In Speech Features On The Recognition Accuracy Of ASR System: A Study On The Malay Speech Impaired Children

NU Voice Conversion System for the Voice Conversion Challenge 2018

Contact Info

Product

Resources

About