ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
DOI: 10.1109/icassp43922.2022.9746212
|View full text |Cite
|
Sign up to set email alerts
|

Polyphone Disambiguation and Accent Prediction Using Pre-Trained Language Models in Japanese TTS Front-End

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(1 citation statement)
references
References 17 publications
0
1
0
Order By: Relevance
“…(1) Parallel corpus of different accents of the same speaker using source and target speech content and time alignment (Finkelstein et al, 2022;Liu et al, 2022;Hida et al, 2022;Toda et al, 2007;Oyamada et al, 2017). (2) Non-parallel corpus of * Corresponding author multiple speakers with multiple accents using inconsistent source and target speech content (Wang et al, 2021;Zhao et al, 2018Zhao et al, , 2019Kaneko et al, 2019Kaneko et al, , 2020aKaneko et al, , 2021Finkelstein et al, 2022) used a multi-stage trained tts model to achieve transfer of North American accents, Australian accents, and British accents, and used a CHiVE-BERT pre-training model to enhance the audio effect of accent generation.…”
Section: Introductionmentioning
confidence: 99%
“…(1) Parallel corpus of different accents of the same speaker using source and target speech content and time alignment (Finkelstein et al, 2022;Liu et al, 2022;Hida et al, 2022;Toda et al, 2007;Oyamada et al, 2017). (2) Non-parallel corpus of * Corresponding author multiple speakers with multiple accents using inconsistent source and target speech content (Wang et al, 2021;Zhao et al, 2018Zhao et al, , 2019Kaneko et al, 2019Kaneko et al, , 2020aKaneko et al, , 2021Finkelstein et al, 2022) used a multi-stage trained tts model to achieve transfer of North American accents, Australian accents, and British accents, and used a CHiVE-BERT pre-training model to enhance the audio effect of accent generation.…”
Section: Introductionmentioning
confidence: 99%