2019
DOI: 10.1109/taslp.2019.2926754
|View full text |Cite
|
Sign up to set email alerts
|

Using Phonetic Posteriorgram Based Frame Pairing for Segmental Accent Conversion

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
1
1

Relationship

1
6

Authors

Journals

citations
Cited by 13 publications
(3 citation statements)
references
References 46 publications
0
3
0
Order By: Relevance
“…This section briefly describes the most popular datasets for accent conversion. Librispeech [4] is often used for training intermediary models, such as acoustic models [5]- [7] or from pre-trained models [8], [9]. It has more than English speech from 1,166 speakers, with pre-built language models and data for training language models.…”
Section: A Literature Reviewmentioning
confidence: 99%
“…This section briefly describes the most popular datasets for accent conversion. Librispeech [4] is often used for training intermediary models, such as acoustic models [5]- [7] or from pre-trained models [8], [9]. It has more than English speech from 1,166 speakers, with pre-built language models and data for training language models.…”
Section: A Literature Reviewmentioning
confidence: 99%
“…from an ASR system) to estimate the posterior probability that each frame belongs to a set of pre-defined phonetic units (e.g., a phonetic posteriorgram, or PPG [27]). Once a PPG has been computed for each source and target frame in the corpus, the two can be paired in a many-to-many fashion based on the similarity between their respective PPGs [5,28].…”
Section: Accent Conversionmentioning
confidence: 99%
“…In this work, we propose a new methodology to examine the role of accent and voice quality in talker recognition. Our methodology relies on the use of "accent conversion" techniques [5,6] to transform utterances from second-language (L2) learners 1 to mimic the pronunciation patterns (i.e., accent) of a native (L1) speaker, and vice versa. Our accent conversion model consists of three basic components: an acoustic model that generates a speaker-independent embedding of an utterance (a posteriorgram, or PPG), a sequence-to-sequence (seq2seq) synthesizer that maps PPGs into Mel-spectrograms, and a vocoder that maps the Mel-spectrogram into a highquality speech waveform.…”
Section: Introductionmentioning
confidence: 99%