2021
DOI: 10.48550/arxiv.2110.08583
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

ASR4REAL: An extended benchmark for speech models

Abstract: Popular ASR benchmarks such as Librispeech and Switchboard are limited in the diversity of settings and speakers they represent. We introduce a set of benchmarks matching reallife conditions, aimed at spotting possible biases and weaknesses in models. We have found out that even though recent models do not seem to exhibit a gender bias, they usually show important performance discrepancies by accent, and even more important ones depending on the socio-economic status of the speakers. Finally, all tested models… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(6 citation statements)
references
References 14 publications
0
6
0
Order By: Relevance
“…However, little investigation has been carried out about estimating the WERs gap produced by gender disparities. To name a few [15,16,17]. In this work, we perform an analysis by fine-tuning E2E models with ATC audio from different genders.…”
Section: Contribution and Motivationmentioning
confidence: 99%
See 2 more Smart Citations
“…However, little investigation has been carried out about estimating the WERs gap produced by gender disparities. To name a few [15,16,17]. In this work, we perform an analysis by fine-tuning E2E models with ATC audio from different genders.…”
Section: Contribution and Motivationmentioning
confidence: 99%
“…All the experiments are based on the most robust E2E model from Table 2 i.e., w2v2-L-60K. 17 The WERs plot are obtained with greedy decoding and no LM or explicit textual information added. We fine-tune 18 models varying the training data set (either NATS or ISAVIA) and varying the amount of fine-tuning samples.…”
Section: Do Multilingual Pre-trained E2e Models Help?mentioning
confidence: 99%
See 1 more Smart Citation
“…Previous research measuring bias in speech processing models largely studies differences in performance on specific speech tasks, for data sourced from people of differing social groups. Social group-based performance comparisons exist for Automated Speech Recognition (ASR) (Tatman, 2017;Tatman and Kasten, 2017;Koenecke et al, 2020;Feng et al, 2021;Liu et al, 2022b;Riviere et al, 2021), Speaker Verification or Speaker Identification (SID) (Hutiri and Ding, 2022;Fenu et al, 2021Fenu et al, , 2020Fenu and Marras, 2022;Chen et al, 2022b;Meng et al, 2022), as well as a number of other speech tasks (Meng et al, 2022;Hutiri et al, 2023). Differences in model performance based on the gender (Tatman, 2017;Tatman and Kasten, 2017;Chen et al, 2022b;Feng et al, 2021;Liu et al, 2022b;Hutiri and Ding, 2022;Fenu et al, 2020Fenu et al, , 2021Fenu and Marras, 2022;Riviere et al, 2021), dialect (Tatman, 2017;Tatman and Kasten, 2017), race (Koenecke et al, 2020;Tatman and Kasten, 2017;Chen et al, 2022b;Riviere et al, 2021), age (Fenu et al, 2020, city (Koenecke et al, 2020), nationality (Hutiri and Ding, 2022), and native language (Feng et al, 2021) of the speaker have been tested.…”
Section: Bias In Speech Modelsmentioning
confidence: 99%
“…Research in fairness for speech recognition is still in its nascent stage, but is of great importance to society given the increasing pervasiveness of ASR technology. Prior studies have shown ASR performance disparities based on gender, age, race and ethnic backgrounds [8,9,10,11,12,13]. We hope to advance the work in this area with three contributions: (1) An assessment of fairness in speech recognition at scale; (2) a novel approach for automated discovery of underperforming cohorts; and (3) to the best of our knowledge, a first report on fairness mitigation for production-scale ASR systems.…”
Section: Introductionmentioning
confidence: 95%