2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2011
DOI: 10.1109/icassp.2011.5947572
|View full text |Cite
|
Sign up to set email alerts
|

Speaker similarity evaluation of foreign-accented speech synthesis using HMM-based speaker adaptation

Abstract: This paper describes a speaker discrimination experiment in which native English listeners were presented with natural and synthetic speech stimuli in English and were asked to judge whether they thought the sentences were spoken by the same person or not. The natural speech consisted of recordings of Finnish speakers speaking English. The synthetic stimuli were created using adaptation data from the same Finnish speakers. Two average voice models were compared: one trained on Finnish-accented English and the … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
20
0

Year Published

2012
2012
2018
2018

Publication Types

Select...
4
2

Relationship

2
4

Authors

Journals

citations
Cited by 13 publications
(22 citation statements)
references
References 10 publications
2
20
0
Order By: Relevance
“…These findings give a good basis to further explore the behaviour of listeners in S2ST system evaluations. Preliminary experiments investigating various aspects of listeners' behaviour on synthetic speech in a S2ST context can be found in Wester and Karhila (2011); Karhila and Wester (2011);Wester and Liang (2011a).…”
Section: Discussionmentioning
confidence: 99%
“…These findings give a good basis to further explore the behaviour of listeners in S2ST system evaluations. Preliminary experiments investigating various aspects of listeners' behaviour on synthetic speech in a S2ST context can be found in Wester and Karhila (2011); Karhila and Wester (2011);Wester and Liang (2011a).…”
Section: Discussionmentioning
confidence: 99%
“…In parallel with the research presented in this paper, other research has been investigating the above issues. For more details, please refer to Wester (2010); Wester and Karhila (2011);Tsuzaki et al (2011).…”
Section: Discussionmentioning
confidence: 99%
“…As references for judging the degree of speaker similarity of the synthetic speech to the original speaker, we used natural speech. However, it has been shown that there is a significant degradation in a listener's ability to decide on speaker similarity when comparing natural and synthetic speech stimuli (Wester and Karhila, 2011). The task here is further made more complex by requiring the listeners to rate speaker similarity across languages.…”
Section: Number Of Adaptation Sentencesmentioning
confidence: 99%
“…A well-known method for data augmentation is speaker adaptation, where the most common approach is to build an average voice model of multiple speakers and then adapt a model for new (target) speaker from it. Speaker adaptation is a wellresearched topic in HMM-based speech synthesis [4,5,6,7,8,9] but still relatively unexplored for DNN-based synthesis. Arik et al [10] found that speaker adaptation by fine-tuning (i.e.…”
Section: Data Augmentationmentioning
confidence: 99%