Interspeech 2019 2019
DOI: 10.21437/interspeech.2019-1778
|View full text |Cite
|
Sign up to set email alerts
|

Foreign Accent Conversion by Synthesizing Speech from Phonetic Posteriorgrams

Abstract: Methods for foreign accent conversion (FAC) aim to generate speech that sounds similar to a given non-native speaker but with the accent of a native speaker. Conventional FAC methods borrow excitation information (F0 and aperiodicity; produced by a conventional vocoder) from a reference (i.e., native) utterance during synthesis time. As such, the generated speech retains some aspects of the voice quality of the native speaker. We present a framework for FAC that eliminates the need for conventional vocoders (e… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
31
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
2

Relationship

1
5

Authors

Journals

citations
Cited by 33 publications
(31 citation statements)
references
References 31 publications
0
31
0
Order By: Relevance
“…al. [28] , which has been shown to produce higher ratings of acoustic quality and naturalness than traditional systems which use conventional vocoders such as STRAIGHT Figure 2: PPG-to-Mel conversion model [29] or World [30]. The system consists of three components: an acoustic model (AM) that extracts phonetic posteriorgrams (PPGs) from source utterances, a sequence-to-sequence (seq2seq) synthesizer that maps PPGs to Mel-spectrograms, and a WaveGlow vocoder that synthesizes speech waveforms from Mel-spectrograms.…”
Section: Methodsmentioning
confidence: 99%
See 4 more Smart Citations
“…al. [28] , which has been shown to produce higher ratings of acoustic quality and naturalness than traditional systems which use conventional vocoders such as STRAIGHT Figure 2: PPG-to-Mel conversion model [29] or World [30]. The system consists of three components: an acoustic model (AM) that extracts phonetic posteriorgrams (PPGs) from source utterances, a sequence-to-sequence (seq2seq) synthesizer that maps PPGs to Mel-spectrograms, and a WaveGlow vocoder that synthesizes speech waveforms from Mel-spectrograms.…”
Section: Methodsmentioning
confidence: 99%
“…from an ASR system) to estimate the posterior probability that each frame belongs to a set of pre-defined phonetic units (e.g., a phonetic posteriorgram, or PPG [27]). Once a PPG has been computed for each source and target frame in the corpus, the two can be paired in a many-to-many fashion based on the similarity between their respective PPGs [5,28].…”
Section: Accent Conversionmentioning
confidence: 99%
See 3 more Smart Citations