Interspeech 2019 2019
DOI: 10.21437/interspeech.2019-1949
|View full text |Cite
|
Sign up to set email alerts
|

Improved Deep Duel Model for Rescoring N-Best Speech Recognition List Using Backward LSTMLM and Ensemble Encoders

Abstract: We have proposed a neural network (NN) model called a deep duel model (DDM) for rescoring N-best speech recognition hypothesis lists. A DDM is composed of a long short-term memory (LSTM)-based encoder followed by a fully-connected linear layer-based binary-class classifier. Given the feature vector sequences of two hypotheses in an N-best list, the DDM encodes the features and selects the hypothesis that has the lower word error rate (WER) based on the output binary-class probabilities. By repeating this one-o… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
5
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(5 citation statements)
references
References 33 publications
0
5
0
Order By: Relevance
“…There have been several attempts to mitigate ASR error propagation in text-based pipelines. One straightforward idea is to correct the ASR output, using error correction models (Weng et al, 2020;Tam et al, 2014) or by ranking n-best hypotheses (Ogawa et al, 2018(Ogawa et al, , 2019Fohr and Illina, 2021). Other approach is to leverage extra information from ASR -output lattice (Ladhak et al, 2016;Chen, 2019, 2020), nbest hypotheses (Morbini et al, 2012;Li et al, 2020;Liu et al, 2021) or word confusion networks/embeddings (Tür et al, 2002;Shivakumar et al, 2019).…”
Section: Related Workmentioning
confidence: 99%
“…There have been several attempts to mitigate ASR error propagation in text-based pipelines. One straightforward idea is to correct the ASR output, using error correction models (Weng et al, 2020;Tam et al, 2014) or by ranking n-best hypotheses (Ogawa et al, 2018(Ogawa et al, , 2019Fohr and Illina, 2021). Other approach is to leverage extra information from ASR -output lattice (Ladhak et al, 2016;Chen, 2019, 2020), nbest hypotheses (Morbini et al, 2012;Li et al, 2020;Liu et al, 2021) or word confusion networks/embeddings (Tür et al, 2002;Shivakumar et al, 2019).…”
Section: Related Workmentioning
confidence: 99%
“…[20] formalize the N-best list rescoring as a learning problem and use a wide range of features with automatically optimized weights. [13] [14] introduce Nbest rescoring through an LSTM-based encoder network followed by a fully-connected feed-forward NN-based binaryclass classifier. [19] propose a bi-directional LM for rescoring, and utilize the word prediction capability of the BERT [3] [24].…”
Section: Introductionmentioning
confidence: 99%
“…[14] introduce a deep duel model composed of an LSTM-based encoder followed by fully-connected linear layer and binary classifier. In [15], this approach is improved by employing ensemble encoders, which have powerful encoding capability. [18] adapt BERT [3,23] to sentence scoring, and the left and right representations are mixed with a bidirectional language model.…”
Section: Introductionmentioning
confidence: 99%