2019
DOI: 10.1007/978-3-030-27947-9_26
|View full text |Cite
|
Sign up to set email alerts
|

Czech Speech Synthesis with Generative Neural Vocoder

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
2

Relationship

1
5

Authors

Journals

citations
Cited by 7 publications
(5 citation statements)
references
References 10 publications
0
5
0
Order By: Relevance
“…Contrary to it, the listening tests rated the Synt3 as the best, then the Synt1 as medium, and the Synt2 as the worst-see the 3D bar-graph in Figure 12c. It also indicates similarity between Synt1 and Synt2 types for the female voice F2 (MUSHRA scores are 48.5% vs. 48.9% [17]). Our speech features used for GMM-based evaluation apparently reflect better naturalness of the USEL synthesis using units of original speech recordings, although it causes undesirable artifacts due to concatenation of these units [19].…”
Section: Discussion Of the Obtained Resultsmentioning
confidence: 89%
See 4 more Smart Citations
“…Contrary to it, the listening tests rated the Synt3 as the best, then the Synt1 as medium, and the Synt2 as the worst-see the 3D bar-graph in Figure 12c. It also indicates similarity between Synt1 and Synt2 types for the female voice F2 (MUSHRA scores are 48.5% vs. 48.9% [17]). Our speech features used for GMM-based evaluation apparently reflect better naturalness of the USEL synthesis using units of original speech recordings, although it causes undesirable artifacts due to concatenation of these units [19].…”
Section: Discussion Of the Obtained Resultsmentioning
confidence: 89%
“…The second collected speech corpus (SC2) consists of four parts: the natural speech uttered by the original speakers and three variations of speech synthesis: the USEL based TTS system (assigned to Synt1) and two LSTM based systems with different vocoders: conventional WORLD (further referred to as Synt2) [16], WaveRNN (referred to as Synt3) [17]. As in the case of SC1, the original and synthetic speech originated from the speakers M1, M2, and F1, F2.…”
Section: Materials Used Initial Settings and Conditionsmentioning
confidence: 99%
See 3 more Smart Citations