9th ISCA Workshop on Speech Synthesis Workshop (SSW 9) 2016
DOI: 10.21437/ssw.2016-21
|View full text |Cite
|
Sign up to set email alerts
|

Non-intrusive Quality Assessment of Synthesized Speech using Spectral Features and Support Vector Regression

Abstract: In this paper, we propose a new quality assessment method for synthesized speech. Unlike previous approaches which uses Hidden Markov Model (HMM) trained on natural utterances as a reference model to predict the quality of synthesized speech, proposed approach uses knowledge about synthesized speech while training the model. The previous approach has been successfully applied in the quality assessment of synthesized speech for the German language. However, it gave poor results for English language databases su… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2017
2017
2021
2021

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(3 citation statements)
references
References 11 publications
0
3
0
Order By: Relevance
“…We used non-intrusive speech quality assessment (NISQA) to predict the naturalness of the voices used in the experiment. NISQA is a model that can automatically evaluate super-wideband speech quality without the need for a clean reference signal [18]. The same model was used to predict the mean opinion score (MOS) in terms of the naturalness of the speech samples [19].…”
Section: Naturalnessmentioning
confidence: 99%
“…We used non-intrusive speech quality assessment (NISQA) to predict the naturalness of the voices used in the experiment. NISQA is a model that can automatically evaluate super-wideband speech quality without the need for a clean reference signal [18]. The same model was used to predict the mean opinion score (MOS) in terms of the naturalness of the speech samples [19].…”
Section: Naturalnessmentioning
confidence: 99%
“…In [12] a double-ended speech naturalness model was presented that uses the English Blizzard Challenge data from 2008-2013 for training and evaluating with a per-system PCC of 0.84. In [13] a naturalness prediction model based on spectral features has been proposed that obtained PCCs between 0.69 and 0.89 on the Blizzard Challenge data from 2008-2010 and 2012. More recently, also, neural networks have been used to predict the naturalness of synthesized speech.…”
Section: Introductionmentioning
confidence: 99%
“…Due to this limitation, there is an increasing focus on non-intrusive assessments. Any synthesized speech can be scored according to the predictions from machine learning models such as Support Vector Machines (Soni and Patil, 2016) and Neural…”
Section: Chapter I Introductionmentioning
confidence: 99%