On training bi-directional neural network language model with noise contrastive estimation

He, Tianxing; Zhang, Yu; Droppo, Jasha; Yu, Kai

doi:10.1109/iscslp.2016.7918423

Cited by 20 publications

(13 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In contrast, by using both their scores in the DDM (model 8), we can steadily reduce the WER from when using only one of their scores (model 6 or 7). This result confirms that the DDM can effectively exploit complementary features and encourages us to use other types of LSTMLM scores in addition [27][28][29][30][31].…”

Section: Experimental Settingssupporting

confidence: 67%

“…We have improved our DDM for rescoring N -best speech recognition hypothesis lists by using the backward LSTMLM score and ensemble encoders. Future work will include the use of other types of LSTMLM scores [27][28][29][30][31] and the use of a mixture-of-experts framework [36,37]. We also plan to compare our DDM with DLMs [16,17] and apply it to rescoring N -best machine translation hypothesis lists [41,42].…”

Section: Discussionmentioning

confidence: 99%

“…Various types of RNNLMs have been proposed in addition to the forward RNNLM, e.g. backward RNNLMs [5,6,10,26], bidirectional RNNLMs [27][28][29], mixture-of-softmax (MoS) RNNLMs [30], and whole sentence RNNLMs [31], and their effectiveness has been reported. Among these candidates, in this study, we select the backward LSTMLM that exploits the succeeding word sequence to predict the current word.…”

Section: Backward Lstmlmmentioning

confidence: 99%

See 2 more Smart Citations

Improved Deep Duel Model for Rescoring N-Best Speech Recognition List Using Backward LSTMLM and Ensemble Encoders

et al. 2019

View full text Add to dashboard Cite

We have proposed a neural network (NN) model called a deep duel model (DDM) for rescoring N-best speech recognition hypothesis lists. A DDM is composed of a long short-term memory (LSTM)-based encoder followed by a fully-connected linear layer-based binary-class classifier. Given the feature vector sequences of two hypotheses in an N-best list, the DDM encodes the features and selects the hypothesis that has the lower word error rate (WER) based on the output binary-class probabilities. By repeating this one-on-one hypothesis comparison (duel) for each hypothesis pair in the N-best list, we can find the oracle (lowest WER) hypothesis as the survivor of the duels. We showed that the DDM can exploit the score provided by a forward LSTM-based recurrent NN language model (LSTMLM) as an additional feature to accurately select the hypotheses. In this study, we further improve the selection performance by introducing two modifications, i.e. adding the score provided by a backward LSTMLM, which uses succeeding words to predict the current word, and employing ensemble encoders, which have a high feature encoding capability. By combining these two modifications, our DDM achieves an over 10% relative WER reduction from a strong baseline obtained using both the forward and backward LSTMLMs.

show abstract

Section: Experimental Settingssupporting

confidence: 67%

Section: Discussionmentioning

confidence: 99%

Section: Backward Lstmlmmentioning

confidence: 99%

See 1 more Smart Citation

Improved Deep Duel Model for Rescoring N-Best Speech Recognition List Using Backward LSTMLM and Ensemble Encoders

et al. 2019

View full text Add to dashboard Cite

show abstract

“…Several previous work [10,11,12] have investigated the possibility to define a "bi-directional language model", by directly replacing the uni-directional LSTM for the conditional probability…”

Section: Related Work and Motivationmentioning

confidence: 99%

Investigation on Estimation of Sentence Probability by Combining Forward, Backward and Bi-directional LSTM-RNNs

Irie

Lei

Deng³

et al. 2018

Interspeech 2018

View full text Add to dashboard Cite

A combination of forward and backward long short-term memory (LSTM) recurrent neural network (RNN) language models is a popular model combination approach to improve the estimation of the sequence probability in the second pass N-best list rescoring in automatic speech recognition (ASR). In this work, we further push such an idea by proposing a combination of three models: a forward LSTM language model, a backward LSTM language model and a bi-directional LSTM based gap completion model. We derive such a combination method from a forward backward decomposition of the sequence probability. We carry out experiments on the Switchboard speech recognition task. While we empirically find that such a combination gives slight improvements in perplexity over the combination of forward and backward models, we finally show that a combination of the same number of forward models gives the best perplexity and word error rate (WER) overall.

show abstract

“…A simple language model is an ngram [1]. In recent years, recurrent neural network language models (RNNLMs) have consistently surpassed traditional ngrams in ASR and related tasks [2,3,4,5,6].…”

Section: Introductionmentioning

confidence: 99%

Recurrent Neural Network Language Model Adaptation for Conversational Speech Recognition

Li¹,

Xu²,

Wang³

et al. 2018

Interspeech 2018

View full text Add to dashboard Cite

We propose two adaptation models for recurrent neural network language models (RNNLMs) to capture topic effects and longdistance triggers for conversational automatic speech recognition (ASR). We use a fast marginal adaptation (FMA) framework to adapt a RNNLM. Our first model is effectively a cache model-the word frequencies are estimated by counting words in a conversation (with utterance-level hold-one-out) from 1stpass decoded word lattices, and then is interpolated with a background unigram distribution. In the second model, we train a deep neural network (DNN) on conversational transcriptions to predict word frequencies given word frequencies from 1stpass decoded word lattices. The second model can in principle model trigger and topic effects but is harder to train. Experiments on three conversational corpora show modest WER and perplexity reductions with both adaptation models.

show abstract

On training bi-directional neural network language model with noise contrastive estimation

Cited by 20 publications

References 19 publications

Improved Deep Duel Model for Rescoring N-Best Speech Recognition List Using Backward LSTMLM and Ensemble Encoders

Improved Deep Duel Model for Rescoring N-Best Speech Recognition List Using Backward LSTMLM and Ensemble Encoders

Investigation on Estimation of Sentence Probability by Combining Forward, Backward and Bi-directional LSTM-RNNs

Recurrent Neural Network Language Model Adaptation for Conversational Speech Recognition

Contact Info

Product

Resources

About