ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
DOI: 10.1109/icassp43922.2022.9746395
|View full text |Cite
|
Sign up to set email alerts
|

Generalization Ability of MOS Prediction Networks

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

3
34
1

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 55 publications
(38 citation statements)
references
References 24 publications
3
34
1
Order By: Relevance
“…In the next experiment, we used two state-of-the-art speech assessment models as our baselines: (1) MOSNet: a model that is based on a CNN-BLSTM architecture for predicting MOS scores [63]; (2) MOS-SSL: a model that uses features from fine-tuned wav2vec 2.0 to predict MOS scores [56]. Both models were trained on the TMHINT-QI dataset with a singletask criterion to predict the quality or intelligibility score separately.…”
Section: Modelmentioning
confidence: 99%
See 1 more Smart Citation
“…In the next experiment, we used two state-of-the-art speech assessment models as our baselines: (1) MOSNet: a model that is based on a CNN-BLSTM architecture for predicting MOS scores [63]; (2) MOS-SSL: a model that uses features from fine-tuned wav2vec 2.0 to predict MOS scores [56]. Both models were trained on the TMHINT-QI dataset with a singletask criterion to predict the quality or intelligibility score separately.…”
Section: Modelmentioning
confidence: 99%
“…Instead of directly using the outputs, we use the embeddings of these SSL models as the SSL features. For more details, please refer to [52] and [56]. Additionally, MOSA-Net adopts a multi-task learning criterion that simultaneously predicts multiple objective assessment metrics, including speech quality, intelligibility, and distortion scores.…”
Section: Introductionmentioning
confidence: 99%
“…Wav2vec 2.0 [1] is a self-supervised framework for speech representation which has been used for a large variety of different speech-related tasks [5,22,23]. One of the main advantages of the wav2vec approach is that a generic pretrained model can be fine-tuned for a specific purpose using only a small amount of labeled data.…”
Section: Model For Audio-based Detectionmentioning
confidence: 99%
“…wav2vec 2.0 was shown to obtain good baseline results for this challenge in SSL-MOS [14]. The speech waveform is input to wav2vec 2.0.…”
Section: Architecturesmentioning
confidence: 99%
“…The advancements in transformer-based pretrained models such as wav2vec 2.0 [12] and HuBERT [13] enable researchers to explore another semi-supervised method to take advantage of the large amount of speech data that exists without subjective labels. SSL-MOS [14] was one of the baselines provided by the challenge organizers which uses wav2vec 2.0 with a minimal extra layer with promising results.…”
Section: Introductionmentioning
confidence: 99%