2021
DOI: 10.48550/arxiv.2104.01818
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

The Multi-speaker Multi-style Voice Cloning Challenge 2021

Abstract: The Multi-speaker Multi-style Voice Cloning Challenge (M2VoC) aims to provide a common sizable dataset as well as a fair testbed for the benchmarking of the popular voice cloning task. Specifically, we formulate the challenge to adapt an average TTS model to the stylistic target voice with limited data from target speaker, evaluated by speaker identity and style similarity. The challenge consists of two tracks, namely few-shot track and one-shot track, where the participants are required to clone multiple targ… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 37 publications
0
1
0
Order By: Relevance
“…It is known as different terms in academia and industry, such as voice adaptation [43], voice cloning [9], custom voice [39], etc. Adaptive TTS has been a hot research topic, e,g., a lot of works in statistic parametric speech synthesis have studied voice adaptation [77,386,443,65,122], and the recent voice cloning challenge also attracts a lot of participants [388,118,331,45]. In adaptive TTS scenario, a source TTS model (usually trained on a multi-speaker speech dataset) is usually adapted with few adaptation data for each target voice.…”
Section: Adaptive Ttsmentioning
confidence: 99%
“…It is known as different terms in academia and industry, such as voice adaptation [43], voice cloning [9], custom voice [39], etc. Adaptive TTS has been a hot research topic, e,g., a lot of works in statistic parametric speech synthesis have studied voice adaptation [77,386,443,65,122], and the recent voice cloning challenge also attracts a lot of participants [388,118,331,45]. In adaptive TTS scenario, a source TTS model (usually trained on a multi-speaker speech dataset) is usually adapted with few adaptation data for each target voice.…”
Section: Adaptive Ttsmentioning
confidence: 99%