2013 IEEE International Conference on Acoustics, Speech and Signal Processing 2013
DOI: 10.1109/icassp.2013.6639103
|View full text |Cite
|
Sign up to set email alerts
|

Predicting speech recognition confidence using deep learning with word identity and score features

Abstract: Confidence classifiers for automatic speech recognition (ASR) provide a quantitative representation for the reliability of ASR decoding. In this paper, we improve the ASR confidence measure performance for an utterance using two distinct approaches: (1) to define and incorporate additional predictors in the confidence classifier including those based on the word identity and on the aggregated words, and (2) to train the confidence classifier built on deep learning architectures including the deep neural networ… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
19
0

Year Published

2015
2015
2019
2019

Publication Types

Select...
4
4

Relationship

0
8

Authors

Journals

citations
Cited by 30 publications
(19 citation statements)
references
References 13 publications
0
19
0
Order By: Relevance
“…We found significant differences in speech recognition confidence scores between SAT and DSAT sessions, as well as between those with SAT and DSAT quality of ASR and intent classification. This suggests that Huang et al's approach [12] is not only effective for predicting ASR quality of utterances, but also correlate well with user ratings of ASR quality at session level for intelligent assistant tasks. This is also why it is effective for predicting user satisfaction and intent classification considering the correlations between them.…”
Section: Acoustic Featuresmentioning
confidence: 88%
See 1 more Smart Citation
“…We found significant differences in speech recognition confidence scores between SAT and DSAT sessions, as well as between those with SAT and DSAT quality of ASR and intent classification. This suggests that Huang et al's approach [12] is not only effective for predicting ASR quality of utterances, but also correlate well with user ratings of ASR quality at session level for intelligent assistant tasks. This is also why it is effective for predicting user satisfaction and intent classification considering the correlations between them.…”
Section: Acoustic Featuresmentioning
confidence: 88%
“…Besides, we adopt Huang et al's method [12] to measure ASR confidence and use the confidence of the voice requests as a feature. In short, ASR confidence gets higher when both acoustic and language model scores of the selected recognition hypothesis are significantly higher than the remaining hypotheses.…”
Section: Acoustic Featuresmentioning
confidence: 99%
“…In recent years, the classifier-based approach has directly benefited from the use of deep learning models outperforming the most accurate earlier classifiers such as CRF [5], [6], [7]. In a first proposal, DNN and kernel deep convex networks (K-DCN) were applied at the utterance level to discriminate between in-grammar and out-of-grammar utterances [13]. In later research, RNN have demonstrated outstanding performance in word-level CE [5], [6], [7], [14].…”
Section: Recent Work In Confidence Estimationmentioning
confidence: 99%
“…Word embeddings are also fed into the first hidden layer, since word identities have shown to be very useful in improving CE [2], [4], [5], [13], [14], [24], [25]. To this end, we have not used a conventional one-hot encoding, as this would make the number of parameters grow linearly with the vocabulary size V .…”
Section: Speaker-adapted Confidence Measures Using Deep Bidirectmentioning
confidence: 99%
“…Feedforward NNs have also been applied, e.g. [14,[18][19][20] (in [19], an RNNLM is used as a feature extractor, but the classifier is a feedforward NN). However, to the best of our knowledge, there seems to be no study that directly applies RNNs as the classifiers for ASR error detection.…”
Section: Introductionmentioning
confidence: 99%