ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
DOI: 10.1109/icassp43922.2022.9746979
|View full text |Cite
|
Sign up to set email alerts
|

Improving Confidence Estimation on Out-of-Domain Data for End-to-End Speech Recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(3 citation statements)
references
References 29 publications
0
3
0
Order By: Relevance
“…Detailed implementation issues over model configuration, and choice of input features when constructing the CEMs are also investigated in a series of ablation studies. In contrast, similar E2E ASR systems' decoder output probabilities smoothing approaches were only studied for confidence score estimation [81]- [84], but not for speaker adaptation of Conformer ASR systems as considered in this paper.…”
Section: B Speaker Adaptation For E2e Asr Systemsmentioning
confidence: 99%
See 1 more Smart Citation
“…Detailed implementation issues over model configuration, and choice of input features when constructing the CEMs are also investigated in a series of ablation studies. In contrast, similar E2E ASR systems' decoder output probabilities smoothing approaches were only studied for confidence score estimation [81]- [84], but not for speaker adaptation of Conformer ASR systems as considered in this paper.…”
Section: B Speaker Adaptation For E2e Asr Systemsmentioning
confidence: 99%
“…A key issue in designing robust CEMs is to select the most informative prediction features to better distinguish the correct recognition hypotheses from erroneous ones. Inspired by previous research on confidence score estimation for E2E ASR systems [81], [84], various forms of neural features extracted from different components of the Conformer ASR system for CEM training are investigated in this paper. These include: 1) Internal embedding features that are extracted at the last layers of the encoder and decoder blocks; 2) 1-best output scores that are produced during beam search by either the decoder or CTC module alone, or their combination, for the current 1-best token, or utterance-level hypothesis being considered; 3) N-best output scores as an extension to 2) that can exploit additional information of modelling confusion over competing hypotheses or tokens; and 4) Feature fusion among the above from 1) to 3).…”
Section: A Feature Selection For Cemmentioning
confidence: 99%
“…[20] proposed removing words with low confidence and selecting words with the highest confidence from multiple ASR hypotheses [21]. However, the measurement and use of confidence scores is a complicated task due to out-of-domain words, overconfidence, and so on [22,23]. Hence, we analyze how confidence varies with emotion by exploring the word confidence scores.…”
Section: Emotion and Asr Confidencementioning
confidence: 99%