Confidence Estimation for Attention-Based Sequence-to-Sequence Models for Speech Recognition

Li, Qiujia; Qiu, David; Li, Bo; He, Yanzhang; Woodland, Philip C.; Cao, Liangliang; Strohman, Trevor

doi:10.1109/icassp39728.2021.9414920

Cited by 34 publications

(15 citation statements)

References 31 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…[12]. The second one is based on the confidence estimation module (CEM) [10], an effective confidence method in ASR. In order to comply with the CEM input format, a two-layer GRU with 128 dimensions is added as the decoder, and the CEM module has one fully-connected layer with 256 units.…”

Section: Data Set and Model Configurationsmentioning

confidence: 99%

“…For better performance, neural confidence estimation methods are drawing wide research interests to date. These works mainly focus on deriving increasingly discriminating set of features for the binary classifier under the specific structure like attentionbased sequence-to-sequence models [10] and RNN-T models [11,12]. However, these methods are highly sensitive to the model structure and the set of features extracted.…”

mentioning

confidence: 99%

See 1 more Smart Citation

Neural keyword confidence estimation for open‐vocabulary keyword spotting

Liu

Zhang

2021

Electronics Letters

View full text Add to dashboard Cite

Despite the recent prevalence of keyword spotting (KWS) in smart home, open‐vocabulary KWS remains a keen but unmet need among the users. Conventional open‐vocabulary KWS systems are difficult to obtain a high wake‐up rate and low false alarms simultaneously due to the lack of specific data for model optimisation. In this letter, a light‐weight neural keyword confidence estimation module (KCEM) for the second detection part in the open‐vocabulary KWS system is proposed, which utilises the transformer structure to calculate the confidence by fusing the keyword embedding and the acoustic feature obtained from the KWS model. KCEM method is evaluated on a self‐collected open‐vocabulary KWS test set, yielding equally efficient performance compared with typical confidence estimation methods, a reduction in false reject rate by 34% and 29% relative under clean and noisy conditions, respectively, at 0.04 false alarms per hour.

show abstract

Section: Data Set and Model Configurationsmentioning

confidence: 99%

mentioning

confidence: 99%

Neural keyword confidence estimation for open‐vocabulary keyword spotting

Liu

Zhang

2021

Electronics Letters

View full text Add to dashboard Cite

show abstract

“…However, for end-to-end (E2E) ASR models such as recurrent neural network transducers (RNN-T) and attention-based sequence-to-sequence models, word posteriors cannot be approximated well from the tree-like "lattice" where the prediction of each token conditions on the full history of previous tokens. Autoregressive decoders also tend to be overconfident [24]. To solve this challenge, several model-based methods have been proposed to estimate word and utterance-level confidence for E2E models.…”

Section: Introductionmentioning

confidence: 99%

“…To solve this challenge, several model-based methods have been proposed to estimate word and utterance-level confidence for E2E models. For examples, [25,24] proposed to train a token-level (e.g. graphemes or word-pieces [26]) confidence estimation module (CEM) on top of a given E2E model and the word-level confidence can be simply obtained by averaging the token-level scores.…”

Section: Introductionmentioning

confidence: 99%

“…Model-based approaches typically formulate the confidence estimation problem as a binary classification task [20,21,22,23,25,24,27,28,29], where correct tokens, words, or utterances should have confidence scores close to 1, and 0 otherwise. For word-level confidence estimation, scores between 1 and 0 are only assigned to words that appear in the hypotheses.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Multi-Task Learning for End-to-End ASR Word and Utterance Confidence with Deletion Prediction

Qiu¹,

He²,

Li³

et al. 2021

Interspeech 2021

Self Cite

View full text Add to dashboard Cite

Confidence scores are very useful for downstream applications of automatic speech recognition (ASR) systems. Recent works have proposed using neural networks to learn word or utterance confidence scores for end-to-end ASR. In those studies, word confidence by itself does not model deletions, and utterance confidence does not take advantage of word-level training signals. This paper proposes to jointly learn word confidence, word deletion, and utterance confidence. Empirical results show that multi-task learning with all three objectives improves confidence metrics (NCE, AUC, RMSE) without the need for increasing the model size of the confidence estimation module. Using the utterance-level confidence for rescoring also decreases the word error rates on Google's Voice Search and Long-tail Maps datasets by 3-5% relative, without needing a dedicated neural rescorer.

show abstract

Adaptive data augmentation for mandarin automatic speech recognition

Ding,

Li,

et al. 2024

Appl Intell

View full text Add to dashboard Cite

Confidence Estimation for Attention-Based Sequence-to-Sequence Models for Speech Recognition

Cited by 34 publications

References 31 publications

Neural keyword confidence estimation for open‐vocabulary keyword spotting

Neural keyword confidence estimation for open‐vocabulary keyword spotting

Multi-Task Learning for End-to-End ASR Word and Utterance Confidence with Deletion Prediction

Adaptive data augmentation for mandarin automatic speech recognition

Contact Info

Product

Resources

About