Sequential Recurrent Neural Networks for Language Modeling

Oualil, Youssef; Greenberg, Clayton; Singh, Mittul; Klakow, Dietrich

doi:10.21437/interspeech.2016-422

Cited by 3 publications

(7 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This conclusion shows, similarly to other work e.g. [15,13], that recurrent models can be further improved using N-gram/feedforward information, given that they model different linguistic features. Fig.…”

Section: Ptb Experimentssupporting

confidence: 85%

“…This category typically leads to a significant increase in the number of parameters when combining multiple models. In a first attempt to circumvent these problems, we have recently proposed an SRNN model [15], which combines FFN information and RNN through additional sequential connections at the hidden layer. Although SRNN was successful and did not noticeably suffer from the aforementioned problems, it was solely designed to combine RNN and FNN and is, therefore, not well-suited for other architectures.…”

Section: Model Combination For Language Modelingmentioning

confidence: 99%

“…Motivated by the work in [13], we have recently proposed a Sequential Recurrent Neural Network (SRNN) [15], which combines FFN information and RNN. In this paper, we continue along this line of work by proposing a generalized framework to combine different heterogeneous NN-based architectures in a single mixture model.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

A Neural Network approach for mixing language models

Oualil

Klakow

2017

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Self Cite

View full text Add to dashboard Cite

The performance of Neural Network (NN)-based language models is steadily improving due to the emergence of new architectures, which are able to learn different natural language characteristics. This paper presents a novel framework, which shows that a significant improvement can be achieved by combining different existing heterogeneous models in a single architecture. This is done through 1) a feature layer, which separately learns different NN-based models and 2) a mixture layer, which merges the resulting model features. In doing so, this architecture benefits from the learning capabilities of each model with no noticeable increase in the number of model parameters or the training time. Extensive experiments conducted on the Penn Treebank (PTB) and the Large Text Compression Benchmark (LTCB) corpus showed a significant reduction of the perplexity when compared to state-of-the-art feedforward as well as recurrent neural network architectures.

show abstract

Section: Ptb Experimentssupporting

confidence: 85%

Section: Model Combination For Language Modelingmentioning

confidence: 99%

See 1 more Smart Citation

A Neural Network approach for mixing language models

Oualil

Klakow

2017

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Self Cite

View full text Add to dashboard Cite

show abstract

“…However, the likelihood of a word is also determined by linguistic material outside the ngram window, that is, in a preceding sentence, or even by extralinguistic context. Effects of linguistic expressions in prior discourse can be taken into account with more recent and advanced language modeling techniques [12][13][14][15][16][17][18], but since these models are trained on text corpora too, they do not take into account extralinguistic context. Modeling effects of extralinguistic context is particularly important in absence of linguistic context, i.e.…”

Section: Introductionmentioning

confidence: 99%

Modeling the predictive potential of extralinguistic context with script knowledge: The case of fragments

2021

View full text Add to dashboard Cite

We describe a novel approach to estimating the predictability of utterances given extralinguistic context in psycholinguistic research. Predictability effects on language production and comprehension are widely attested, but so far predictability has mostly been manipulated through local linguistic context, which is captured with n-gram language models. However, this method does not allow to investigate predictability effects driven by extralinguistic context. Modeling effects of extralinguistic context is particularly relevant to discourse-initial expressions, which can be predictable even if they lack linguistic context at all. We propose to use script knowledge as an approximation to extralinguistic context. Since the application of script knowledge involves the generation of prediction about upcoming events, we expect that scrips can be used to manipulate the likelihood of linguistic expressions referring to these events. Previous research has shown that script-based discourse expectations modulate the likelihood of linguistic expressions, but script knowledge has often been operationalized with stimuli which were based on researchers’ intuitions and/or expensive production and norming studies. We propose to quantify the likelihood of an utterance based on the probability of the event to which it refers. This probability is calculated with event language models trained on a script knowledge corpus and modulated with probabilistic event chains extracted from the corpus. We use the DeScript corpus of script knowledge to obtain empirically founded estimates of the likelihood of an event to occur in context without having to resort to expensive pre-tests of the stimuli. We exemplify our method at a case study on the usage of nonsentential expressions (fragments), which shows that utterances that are predictable given script-based extralinguistic context are more likely to be reduced.

show abstract

“…The LTCB data split and processing is the same as the one used in [19,20]. In particular, the LTCB vocabulary is limited to the 80K most frequent words with all remaining words replaced by <unk>.…”

Section: Experiments and Resultsmentioning

confidence: 99%

A Batch Noise Contrastive Estimation Approach for Training Large Vocabulary Language Models

Oualil

Klakow

2017

Interspeech 2017

Self Cite

View full text Add to dashboard Cite

Training large vocabulary Neural Network Language Models (NNLMs) is a difficult task due to the explicit requirement of the output layer normalization, which typically involves the evaluation of the full softmax function over the complete vocabulary. This paper proposes a Batch Noise Contrastive Estimation (B-NCE) approach to alleviate this problem. This is achieved by reducing the vocabulary, at each time step, to the target words in the batch and then replacing the softmax by the noise contrastive estimation approach, where these words play the role of targets and noise samples at the same time. In doing so, the proposed approach can be fully formulated and implemented using optimal dense matrix operations. Applying B-NCE to train different NNLMs on the Large Text Compression Benchmark (LTCB) and the One Billion Word Benchmark (OBWB) shows a significant reduction of the training time with no noticeable degradation of the models performance. This paper also presents a new baseline comparative study of different standard NNLMs on the large OBWB on a single Titan-X GPU.

show abstract

Sequential Recurrent Neural Networks for Language Modeling

Cited by 3 publications

References 15 publications

A Neural Network approach for mixing language models

A Neural Network approach for mixing language models

Modeling the predictive potential of extralinguistic context with script knowledge: The case of fragments

A Batch Noise Contrastive Estimation Approach for Training Large Vocabulary Language Models

Contact Info

Product

Resources

About