Interspeech 2020 2020
DOI: 10.21437/interspeech.2020-3162
|View full text |Cite
|
Sign up to set email alerts
|

Black-Box Adaptation of ASR for Accented Speech

Abstract: We introduce the problem of adapting a black-box, cloud-based ASR system to speech from a target accent. While leading online ASR services obtain impressive performance on mainstream accents, they perform poorly on sub-populations -we observed that the word error rate (WER) achieved by Google's ASR API on Indian accents is almost twice the WER on US accents. Existing adaptation methods either require access to model parameters or overlay an error correcting module on output transcripts. We highlight the need f… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(4 citation statements)
references
References 19 publications
0
4
0
Order By: Relevance
“…Khandelwal et al [37] propose a novel coupling of an opensource accent-tuned local model with a commercial ASR model trained on non-accented speech. The authors introduce the "FineMerge" algorithm to enable a commercial ASR system to guide inference of an accent-tuned ASR system at a character level.…”
Section: System Combinationmentioning
confidence: 99%
“…Khandelwal et al [37] propose a novel coupling of an opensource accent-tuned local model with a commercial ASR model trained on non-accented speech. The authors introduce the "FineMerge" algorithm to enable a commercial ASR system to guide inference of an accent-tuned ASR system at a character level.…”
Section: System Combinationmentioning
confidence: 99%
“…In recent years, cloud-based speech-to-text services from technology companies such as Google and Amazon have become popular, offering flexibility and convenience. However, while these services exhibit impressive performance for general use-cases, they are not optimal for domain-specific vocabulary [12,13,14]. Moreover, these services are provided as black boxes without access to the inner workings of their ASR models, making it impossible to finetune on domain-specific tasks and datasets.…”
Section: Referencementioning
confidence: 99%
“…With the proliferation of open-source speech technology toolkits and cheap(er) cloud computing, some publics may be able to build or modify these technologies without much or any support from governments of corporations (e.g. Masakhane 18 , Khandelwal et al (2020)). Mugar and Gordon (2020) also emphasise that, in their vision, the aim of civic design by and for publics is care rather than innovation, and space for meaningful interaction between people, rather than efficiency.…”
Section: Speech Technology Design As Civic Designmentioning
confidence: 99%