ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020
DOI: 10.1109/icassp40776.2020.9053371
|View full text |Cite
|
Sign up to set email alerts
|

Semi-Supervised Speaker Adaptation for End-to-End Speech Synthesis with Pretrained Models

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
9
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 16 publications
(10 citation statements)
references
References 21 publications
1
9
0
Order By: Relevance
“…Theoretically speaking, as speaker characteristics are selfcontained within an utterance we should be able to clone voices without using text. One practical approach is obtaining automatically annotated transcriptions using a SOTA ASR system [16]. However ASR-predicted transcriptions contain wrong annotations, which affects the performance of the adaptation.…”
Section: B Training Voice Conversion System For Target Speakermentioning
confidence: 99%
See 3 more Smart Citations
“…Theoretically speaking, as speaker characteristics are selfcontained within an utterance we should be able to clone voices without using text. One practical approach is obtaining automatically annotated transcriptions using a SOTA ASR system [16]. However ASR-predicted transcriptions contain wrong annotations, which affects the performance of the adaptation.…”
Section: B Training Voice Conversion System For Target Speakermentioning
confidence: 99%
“…When cloning voices, we fine-tuned both acoustic and vocoder models with the transcribed speech of the targets. This system represented a simple supervised approach by fine-tuning a well-trained single speaker model [16]. • VCM u : our unsupervised VC system followed the adaptation process described in Sec.…”
Section: B Capturing Unique Speaker Characteristicsmentioning
confidence: 99%
See 2 more Smart Citations
“…[15] presents speaker, language and stress/tone embeddings used for TTS that can synthesize speech in multiple speaker identities and languages. Other techniques employ transfer learning methods to benefit from other speech processing applications, such as speaker verification [16] or speech recognition [17].…”
Section: Introductionmentioning
confidence: 99%