The Perceptual Effect of L1 Prosody Transplantation on L2 Speech: The Case of French Accented German

Jügler, Jeanin; Zimmerer, Frank; Trouvain, Jürgen; Möbius, Bernd

doi:10.21437/interspeech.2016-1268

Cited by 4 publications

(1 citation statement)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Early attempts at accent conversion used voice morphing [3,[6][7][8] to control the degree of accent by blending spectral components from the native and non-native speakers. In [18,19], the authors used PSOLA to modify the duration and pitch patterns of accented speech. Aryal and Gutierrez-Osuna [1] adapted voice conversion (VC) techniques, replacing Dynamic Time Warping (DTW) with a technique that matched source and target frames based on their MFCC similarity after vocal tract length normalization.…”

Section: Related Workmentioning

confidence: 99%

Foreign Accent Conversion by Synthesizing Speech from Phonetic Posteriorgrams

2019

View full text Add to dashboard Cite

Methods for foreign accent conversion (FAC) aim to generate speech that sounds similar to a given non-native speaker but with the accent of a native speaker. Conventional FAC methods borrow excitation information (F0 and aperiodicity; produced by a conventional vocoder) from a reference (i.e., native) utterance during synthesis time. As such, the generated speech retains some aspects of the voice quality of the native speaker. We present a framework for FAC that eliminates the need for conventional vocoders (e.g., STRAIGHT, World) and therefore the need to use the native speaker's excitation. Our approach uses an acoustic model trained on a native speech corpus to extract speaker-independent phonetic posteriorgrams (PPGs), and then train a speech synthesizer to map PPGs from the non-native speaker into the corresponding spectral features, which in turn are converted into the audio waveform using a high-quality neural vocoder. At runtime, we drive the synthesizer with the PPG extracted from a native reference utterance. Listening tests show that the proposed system produces speech that sounds more clear, natural, and similar to the non-native speaker compared with a baseline system, while significantly reducing the perceived foreign accent of nonnative utterances.

show abstract