Czech Speech Synthesis with Generative Neural Vocoder

Vít, Jakub; Hanzlíček, Zdeněk; Matoušek, Jindřich

doi:10.1007/978-3-030-27947-9_26

Cited by 7 publications

(5 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Contrary to it, the listening tests rated the Synt3 as the best, then the Synt1 as medium, and the Synt2 as the worst-see the 3D bar-graph in Figure 12c. It also indicates similarity between Synt1 and Synt2 types for the female voice F2 (MUSHRA scores are 48.5% vs. 48.9% [17]). Our speech features used for GMM-based evaluation apparently reflect better naturalness of the USEL synthesis using units of original speech recordings, although it causes undesirable artifacts due to concatenation of these units [19].…”

Section: Discussion Of the Obtained Resultsmentioning

confidence: 89%

“…The second collected speech corpus (SC2) consists of four parts: the natural speech uttered by the original speakers and three variations of speech synthesis: the USEL based TTS system (assigned to Synt1) and two LSTM based systems with different vocoders: conventional WORLD (further referred to as Synt2) [16], WaveRNN (referred to as Synt3) [17]. As in the case of SC1, the original and synthetic speech originated from the speakers M1, M2, and F1, F2.…”

Section: Materials Used Initial Settings and Conditionsmentioning

confidence: 99%

“…One of them was uttered in high-quality original speech Orig and the three remaining ones were synthesized by the methods Synt1, Synt2, Synt3. This test, consisting of the same utterances for every listener, was undertaken by 18 listeners, with 8 of them having experience in speech synthesis [17]. The graphical comparison of the GMM-based evaluation results with the subjective results by the MUSHRA listening test can be found in Figure 12.…”

Section: Review 12 Of 19mentioning

confidence: 99%

“…The listening test evaluations were carried out previously between the years 2017 and 2019 for different research purposes [11,17]. In both of the tests, the order of the utterances was randomized in each of the ten sets so that the synthesis method was not known to the listener in advance.…”

Section: Review 12 Of 19mentioning

confidence: 99%

“…In the second basic evaluation experiment, the three tested types of speech synthesis were the following: (1) the basic USEL synthesis, (2) the synthesis using a deep neural network (DNN) with a long short-term memory (LSTM) and a conventional WORLD vocoder [16], (3) the synthesis using a recurrent neural network with the LSTM and a WaveRNN [17] vocoder. The speech synthesized by the methods using the neural networks is typologically different from that produced by the USEL synthesizer.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

GMM-Based Evaluation of Synthetic Speech Quality Using 2D Classification in Pleasure-Arousal Scale

Přibil

Přibilová

Matoušek³

2020

Applied Sciences

Self Cite

View full text Add to dashboard Cite

The paper focuses on the description of a system for the automatic evaluation of synthetic speech quality based on the Gaussian mixture model (GMM) classifier. The speech material originating from a real speaker is compared with synthesized material to determine similarities or differences between them. The final evaluation order is determined by distances in the Pleasure-Arousal (P-A) space between the original and synthetic speech using different synthesis and/or prosody manipulation methods implemented in the Czech text-to-speech system. The GMM models for continual 2D detection of P-A classes are trained using the sound/speech material from the databases without any relation to the original speech or the synthesized sentences. Preliminary and auxiliary analyses show a substantial influence of the number of mixtures, the number and type of the speech features used the size of the processed speech material, as well as the type of the database used for the creation of the GMMs on the P-A classification process and on the final evaluation result. The main evaluation experiments confirm the functionality of the system developed. The objective evaluation results obtained are principally correlated with the subjective ratings of human evaluators; however, partial differences were indicated, so a subsequent detailed investigation must be performed.

show abstract

Section: Discussion Of the Obtained Resultsmentioning

confidence: 89%

Section: Materials Used Initial Settings and Conditionsmentioning

confidence: 99%

Section: Review 12 Of 19mentioning

confidence: 99%

Section: Review 12 Of 19mentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations