2022
DOI: 10.25073/2588-1086/vnucsce.358
|View full text |Cite
|
Sign up to set email alerts
|

VLSP 2021 - TTS Challenge: Vietnamese Spontaneous Speech Synthesis

Abstract: Text-To-Speech (TTS) was one of nine shared tasks in the eighth annual international VLSP 2021 workshop. All three previous TTS shared tasks were conducted on reading datasets. However, the synthetic voices were not natural enough for spoken dialog systems where the computer must talk to the human in a conversation. Speech datasets recorded in a spontaneous environment help a TTS system to produce more natural voices in speaking style, speaking rate, intonation... Therefore, in this shared task, participants w… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(7 citation statements)
references
References 4 publications
0
7
0
Order By: Relevance
“…In SUS (Semantically Unpredictable Sentences) intelligibility test, we reach 15.00% SER (Sentences Error Rate), the best performing system in TTS task. We believe that one of the main factors to explain the results is the data processing, while most teams almost similarly use the technique [2].…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…In SUS (Semantically Unpredictable Sentences) intelligibility test, we reach 15.00% SER (Sentences Error Rate), the best performing system in TTS task. We believe that one of the main factors to explain the results is the data processing, while most teams almost similarly use the technique [2].…”
Section: Discussionmentioning
confidence: 99%
“…The VLSP Speech Synthesis Challenge 2021 [2] is focused on building Vietnamese spontaneous speech synthesizers provided speech data. Before receiving the dataset, participants must join to contribute to building it.…”
Section: _______mentioning
confidence: 99%
See 1 more Smart Citation
“…The TTS shared task [14] organized at the eighth workshop of the Association for Vietnamese Language and Speech Processing (VLSP) requires participants to create a Vietnamese TTS system able to synthesize natural sounding audios while having trained on a spontaneous and noisy dataset. To be precise, this year's dataset for TTS uses speech crawled from videos of a female Hanoi YouTuber named "Giang oi".…”
Section: Introductionmentioning
confidence: 99%
“…Dataset of the competition [14] exploited the voice source from a female youtuber. The challenges of using spontaneous speech are i) poor quality (e.g., inconsistent speaking rate)…”
Section: Introductionmentioning
confidence: 99%