ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
DOI: 10.1109/icassp39728.2021.9414830
|View full text |Cite
|
Sign up to set email alerts
|

Error-Driven Fixed-Budget ASR Personalization for Accented Speakers

Abstract: We consider the task of personalizing ASR models while being constrained by a fixed budget on recording speaker specific utterances. Given a speaker and an ASR model, we propose a method of identifying sentences for which the speaker's utterances are likely to be harder for the given ASR model to recognize. We assume a tiny amount of speakerspecific data to learn phoneme-level error models which help us select such sentences. We show that speaker's utterances on the sentences selected using our error model ind… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(1 citation statement)
references
References 30 publications
0
1
0
Order By: Relevance
“…The majority of these existing approaches have focused on the earlier ASR systems instead of the Deep Neural Network (DNN) based models. Although model pruning has been explored for self-supervised and other ASR models (Lai et al, 2021;Wu et al, 2021;Zhen et al, 2021) data subset selection for fine-tuning self-supervised ASR systems has only been explored in the context of personalization for accented speakers (Awasthi et al, 2021). A phoneme-level error model is proposed which selects sentences that yield a lower test WER as compared to random sentence selection.…”
Section: Related Workmentioning
confidence: 99%
“…The majority of these existing approaches have focused on the earlier ASR systems instead of the Deep Neural Network (DNN) based models. Although model pruning has been explored for self-supervised and other ASR models (Lai et al, 2021;Wu et al, 2021;Zhen et al, 2021) data subset selection for fine-tuning self-supervised ASR systems has only been explored in the context of personalization for accented speakers (Awasthi et al, 2021). A phoneme-level error model is proposed which selects sentences that yield a lower test WER as compared to random sentence selection.…”
Section: Related Workmentioning
confidence: 99%