Computer Synthesized Speech Technologies
DOI: 10.4018/978-1-61520-725-1.ch006
|View full text |Cite
|
Sign up to set email alerts
|

Building Personalized Synthetic Voices for Individuals with Dysarthria using the HTS Toolkit

Abstract: For an individual with a speech impairment, it can be necessary for them to use a device to produce synthesized speech to assist their communication. To fully support all functions of human speech communication: communication of information, maintenance of social relationships and displaying identity, the voice must be intelligible and natural-sounding. Ideally, it must also be capable of conveying the speaker’s vocal identity. A new approach based on Hidden Markov models (HMMs) has been proposed as a way of c… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
8
0

Publication Types

Select...
2
2
1

Relationship

3
2

Authors

Journals

citations
Cited by 7 publications
(8 citation statements)
references
References 26 publications
0
8
0
Order By: Relevance
“…Platform for medical voice banking: These voices may be used as a platform for medical voice banking. In [67], the HTS framework was used as personalized synthetic voices for patients who have dysarthria and thus require TTS systems as communication aids. The patients can choose the most similar voice from a wide variety of voices.…”
Section: Discussionmentioning
confidence: 99%
“…Platform for medical voice banking: These voices may be used as a platform for medical voice banking. In [67], the HTS framework was used as personalized synthetic voices for patients who have dysarthria and thus require TTS systems as communication aids. The patients can choose the most similar voice from a wide variety of voices.…”
Section: Discussionmentioning
confidence: 99%
“…This method [9] starts with a speaker-independent model, or "average voice model", learned over multiple speakers and uses model adaptation techniques drawn from speech recognition such as maximum likelihood linear regression (MLLR), to adapt the speaker independent model to a new speaker. It has been shown that using 100 sentences or approximately 6-7 minutes of speech data is sufficient to generate a speaker-adapted voice that sounds similar to the target speech [7]. In the following of this paper we refer the speaker-adapted voices as "voice clones".…”
Section: Speaker Adaptationmentioning
confidence: 99%
“…This approach can be seen as a first attempt of model-based voice reconstruction although it relies only on a partial modeling of the voice components. A voice building process using the hidden Markov model (HMM)-based speech synthesis technique has been investigated to create personalized VOCAs [7][8][9][10]. This approach has been shown to produce high quality output and offers two major advantages over existing methods for voice banking and voice building.…”
Section: Introductionmentioning
confidence: 99%
“…While a state-of-the-art concatenative method [1,2] for TTS is capable of synthesizing natural and smooth speech for a specific voice, an SSS-based approach [3,4] has the strength to produce a diverse spectrum of voices without requiring significant amount of new data. This is an important feature for building next-generation applications such as a story-telling robot capable of synthesizing the speech of multiple characters with different emotions, personalized speech synthesis such as in speechto-speech translation [5,6], and clinical applications such as voice reconstruction of patients with speech disorders [7]. In this article, we study the problem of generating new models of SSS from existing models.…”
Section: Introductionmentioning
confidence: 99%