IEEE International Conference on Acoustics Speech and Signal Processing 2002
DOI: 10.1109/icassp.2002.1005852
|View full text |Cite
|
Sign up to set email alerts
|

Connectionist language modeling for large vocabulary continuous speech recognition

Abstract: This paper describes ongoing work on a new approach for language modeling for large vocabulary continuous speech recognition. Almost all state-of-the-art systems use statistical ¢-gram language models estimated on text corpora. One principle problem with such language models is the fact that many of the ¢-grams are never observed even in very large training corpora, and therefore it is common to back-off to a lower-order model. In this paper we propose to address this problem by carrying out the estimation tas… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
26
0
3

Year Published

2003
2003
2019
2019

Publication Types

Select...
5
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 48 publications
(30 citation statements)
references
References 4 publications
0
26
0
3
Order By: Relevance
“…This formulation applies to a discriminant variant of the RBM called Discriminative RBM . Such conditional energy-based models have also been exploited in a series of probabilistic language models based on neural networks (Bengio et al, 2001;Schwenk & Gauvain, 2002;Bengio, Ducharme, Vincent, & Jauvin, 2003;Xu, Emami, & Jelinek, 2003;Schwenk, 2004;Schwenk & Gauvain, 2005;Mnih & Hinton, 2009). That formulation (or generally when it is easy to sum or maximize over the set of values of the terms of the partition function) has been explored at length (LeCun & Huang, 2005;LeCun et al, 2006;).…”
Section: Conditional Energy-based Modelsmentioning
confidence: 99%
See 1 more Smart Citation
“…This formulation applies to a discriminant variant of the RBM called Discriminative RBM . Such conditional energy-based models have also been exploited in a series of probabilistic language models based on neural networks (Bengio et al, 2001;Schwenk & Gauvain, 2002;Bengio, Ducharme, Vincent, & Jauvin, 2003;Xu, Emami, & Jelinek, 2003;Schwenk, 2004;Schwenk & Gauvain, 2005;Mnih & Hinton, 2009). That formulation (or generally when it is easy to sum or maximize over the set of values of the terms of the partition function) has been explored at length (LeCun & Huang, 2005;LeCun et al, 2006;).…”
Section: Conditional Energy-based Modelsmentioning
confidence: 99%
“…The idea of distributed representation is an old idea in machine learning and neural networks research (Hinton, 1986;Rumelhart et al, 1986a;Miikkulainen & Dyer, 1991;Bengio, Ducharme, & Vincent, 2001;Schwenk & Gauvain, 2002), and it may be of help in dealing with the curse of dimensionality and the limitations of local generalization. A cartoon local representation for integers i ∈ {1, 2, .…”
Section: Learning Distributed Representationsmentioning
confidence: 99%
“…ing (see e. g. [3], [4], and [5]). However, there are fundamental differences in the way neural networks have previously been applied to speech recognition tasks.…”
Section: Introductionmentioning
confidence: 99%
“…So one straightforward solution to make the network work faster is to reduce the output vocabulary size. For example in word error rate (WER) experiments the output vocabulary can be limited to a certain number of most frequent words, which would be a fraction of the actual vocabulary (Schwenk & Gauvain, 2002). Both the training and evaluation time are reduced proportionally with the reduction in output vocabulary size.…”
Section: Vocabulary Limitationmentioning
confidence: 99%