2010
DOI: 10.1109/tasl.2010.2045237
|View full text |Cite
|
Sign up to set email alerts
|

Thousands of Voices for HMM-Based Speech Synthesis–Analysis and Application of TTS Systems Built on Various ASR Corpora

Abstract: In conventional speech synthesis, large amounts of phonetically balanced speech data recorded in highly controlled recording studio environments are typically required to build a voice. Although using such data is a straightforward solution for high quality synthesis, the number of voices available will always be limited, because recording costs are high. On the other hand, our recent experiments with HMM-based speech synthesis systems have demonstrated that speaker-adaptive HMM-based speech synthesis (which u… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
41
0
2

Year Published

2012
2012
2020
2020

Publication Types

Select...
5
2
2

Relationship

2
7

Authors

Journals

citations
Cited by 68 publications
(43 citation statements)
references
References 42 publications
(53 reference statements)
0
41
0
2
Order By: Relevance
“…F0 and F1 values for each of the 60 speakers were calculated at the midpoint of each vowel and we took the mean over all vowel tokens (once again using the Snack toolkit). We used the synthetic speech from Yamagishi et al (2010) for the 60 speakers. We also calculated F0 and F1 values for speaker 001's English synthetic speech and Japanese synthetic speech which was achieved by cross-lingual speaker adaptation based on his English speech data (2000 adaptation sentences).…”
Section: Speech Materials For Kld Analysismentioning
confidence: 99%
See 1 more Smart Citation
“…F0 and F1 values for each of the 60 speakers were calculated at the midpoint of each vowel and we took the mean over all vowel tokens (once again using the Snack toolkit). We used the synthetic speech from Yamagishi et al (2010) for the 60 speakers. We also calculated F0 and F1 values for speaker 001's English synthetic speech and Japanese synthetic speech which was achieved by cross-lingual speaker adaptation based on his English speech data (2000 adaptation sentences).…”
Section: Speech Materials For Kld Analysismentioning
confidence: 99%
“…and the JNAS database for Japanese (Itou et al, 1998). Details of the front-end text processing used to derive phonetic-prosodic labels from the word transcriptions can be found in Yamagishi et al (2010).…”
Section: Introductionmentioning
confidence: 99%
“…Nowadays two are the main speech processing techniques that allow the creation of synthetic speech spoofing signals: First, the statistical speech synthesizers (Yoshimura et al, 1999) (Tokuda et al, 2002) using voices adapted to a particular speaker (Yamagishi et al, 2009) even with minimum quality material (Yamagishi et al, 2010). Second, the voice conversion (VC) techniques (Jin et al, 2008), (Kinnunen et al, 2012).…”
Section: Introductionmentioning
confidence: 99%
“…The HMM-based speech synthesis systems have the ability to synthesize speech with a high degree of naturalness comparable to state-of-the-art unit selection systems [3]. The concept was first proposed by Yoshimura et al, [4] and was developed for languages such as Japanese, English, Thai, Romanian, Mandarin, Korean, Austria, Portuguese, Arabic, Hungarian and German among others [5]- [15].…”
Section: Introductionmentioning
confidence: 99%