Interspeech 2021 2021
DOI: 10.21437/interspeech.2021-1685
|View full text |Cite
|
Sign up to set email alerts
|

Non-Intrusive Speech Quality Assessment with Transfer Learning and Subject-Specific Scaling

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
6
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
2
1
1
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(6 citation statements)
references
References 0 publications
0
6
0
Order By: Relevance
“…The work in [18]- [57] describes MLbased approaches to NR quality estimation. Some of these NR tools produce estimates of subjective test scores that report speech or sound quality mean opinion score (MOS) [18]- [20], [25]- [28], [31], [36], [40], [42], [43], [45], [49], [50], [57], naturalness [29], [35], [37], listening effort [24], noise intrusiveness [50], and speech intelligibility [21], [33]. The non-intrusive speech quality assessment model called NISQA [53] uses log-mel-spectrograms to produce estimates of subjective speech quality as well as four constituent dimensions: noisiness, coloration, discontinuity, and loudness.…”
Section: A Existing Machine Learning Approachesmentioning
confidence: 99%
See 1 more Smart Citation
“…The work in [18]- [57] describes MLbased approaches to NR quality estimation. Some of these NR tools produce estimates of subjective test scores that report speech or sound quality mean opinion score (MOS) [18]- [20], [25]- [28], [31], [36], [40], [42], [43], [45], [49], [50], [57], naturalness [29], [35], [37], listening effort [24], noise intrusiveness [50], and speech intelligibility [21], [33]. The non-intrusive speech quality assessment model called NISQA [53] uses log-mel-spectrograms to produce estimates of subjective speech quality as well as four constituent dimensions: noisiness, coloration, discontinuity, and loudness.…”
Section: A Existing Machine Learning Approachesmentioning
confidence: 99%
“…Sufficient datasets are rare and expensive to generate through laboratory testing, so crowd-sourced tests are becoming common. Joint training or transfer learning can leverage objective FR quality values [26], [31], [50] or impairment categories [49] alongside MOS values to maximize the benefit of those MOS values. Semi-supervised learning [42] is also an effective way to compensate for scarce MOS values.…”
Section: A Existing Machine Learning Approachesmentioning
confidence: 99%
“…As machine learning (ML) has become more powerful and accessible, numerous research groups have sought to apply ML to develop NR tools [17]- [50]. Some of these NR tools produce estimates of subjective test scores that report speech or sound quality mean opinion score (MOS) [17]- [19], [24]- [27], [30], [35], [38], [40], [41], [46], [47], naturalness [28], [34], [36], listening effort [23], noise intrusiveness [47], and speech intelligibility [20], [32]. The non-intrusive speech quality assessment model called NISQA [50] produces estimates of subjective speech quality as well as four constituent dimensions: noisiness, coloration, discontinuity, and loudness.…”
Section: A Existing Machine Learning Approachesmentioning
confidence: 99%
“…Sufficient datasets are rare and expensive to generate through laboratory testing, so crowd-sourced tests are becoming common. Joint training or transfer learning can leverage objective FR quality values [25], [30], [47] or impairment categories [46] alongside MOS values to maximize the benefit of those MOS values. Semi-supervised learning [40] is also an effective way to compensate for scarce MOS values.…”
Section: A Existing Machine Learning Approachesmentioning
confidence: 99%
See 1 more Smart Citation