Using Rater and System Metadata to Explain Variance in the VoiceMOS Challenge 2022 Dataset

Chinen, Michael; Skoglund, Jan; Reddy, Chandan K.; Ragano, Alessandro; Hines, Andrew

doi:10.21437/interspeech.2022-799

Cited by 4 publications

(1 citation statement)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…• Another analysis focused on the metadata of the BVCC dataset [144]. They used the SSL-MOS model and added metadata information.…”

Section: Team Approachesmentioning

confidence: 99%

A review on subjective and objective evaluation of synthetic speech

Cooper,

Huang,

Tsao

et al. 2024

Acoust. Sci. & Tech.

View full text Add to dashboard Cite

Evaluating synthetic speech generated by machines is a complicated process, as it involves judging along multiple dimensions including naturalness, intelligibility, and whether the intended purpose is fulfilled. While subjective listening tests conducted with human participants have been the gold standard for synthetic speech evaluation, its costly process design has also motivated the development of automated objective evaluation protocols. In this review, we first provide a historical view of listening test methodologies, from early in-lab comprehension tests to recent large-scale crowdsourcing mean opinion score (MOS) tests. We then recap the development of automatic measures, ranging from signal-based metrics to model-based approaches that utilize deep neural networks or even the latest self-supervised learning techniques. We also describe the VoiceMOS Challenge series, a scientific event we founded that aims to promote the development of data-driven synthetic speech evaluation. Finally, we provide insights into unsolved issues in this field as well as future prospective. This review is expected to serve as an entry point for early academic researchers to enrich their knowledge in this field, as well as speech synthesis practitioners to catch up on the latest developments.

show abstract

“…• Another analysis focused on the metadata of the BVCC dataset [144]. They used the SSL-MOS model and added metadata information.…”

Section: Team Approachesmentioning

confidence: 99%