Structured Output Layer neural network language model

Le, Hai-Son; Oparin, Ilya; Allauzen, Alexandre; Gauvain, Jean‐Luc; Yvon, François

doi:10.1109/icassp.2011.5947610

Cited by 83 publications

(65 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Various workarounds have been proposed, relying for instance on a structured output layer using word-classes (Mnih and Hinton, 2008;Le et al, 2011). A different alternative, which however only delivers quasi-normalized scores, is to train the network using the Noise Contrastive Estimation or NCE for short (Gutmann and Hyvärinen, 2010;Mnih and Teh, 2012).…”

Section: Neural Architecturesmentioning

confidence: 99%

The KIT-LIMSI Translation System for WMT 2015

Do²,

Cho

et al. 2015

Proceedings of the Tenth Workshop on Statistical Machine Translation

Self Cite

View full text Add to dashboard Cite

This paper presented the joined submission of KIT and LIMSI to the English to German translation task of WMT 2015. In this year submission, we integrated a neural network-based translation model into a phrase-based translation model by rescoring the n-best lists.Since the computation complexity is one of the main issues for continuous space models, we compared two techniques to reduce the computation cost. We investigated models using a structured output layer as well as models trained with noise contrastive estimation. Furthermore, we evaluated a new method to obtain the best log-linear combination in the rescoring phase.Using these techniques, we were able to improve the BLEU score of the baseline phrase-based system by 1.4 BLEU points.

show abstract

Section: Neural Architecturesmentioning

confidence: 99%

The KIT-LIMSI Translation System for WMT 2015

Do²,

Cho

et al. 2015

Proceedings of the Tenth Workshop on Statistical Machine Translation

Self Cite

View full text Add to dashboard Cite

show abstract

“…As the vocabulary size increases the size of the weight matrix between the hidden layer and the output layer becomes the dominant factor in the complexity of training. There are strategies like using a shortlist of words [115] or a hierarchical representation of words [101,100,83,84] that reduce this complexity. In this thesis, we use the class-based RNNLM architecture that is introduced in [97].…”

Section: Class-based Rnnlmsmentioning

confidence: 99%

“…The hierarchical NNLM [101,100] adopts a binary clustering of the words at the output layer to reduce the computational complexity. Structured output layer NNLMs [83,84] use another tree representation at the output layer. In this approach, all words except a shortlist of words are clustered based on the distributed representations learned at the projection layer.…”

Section: Neural Network Lmsmentioning

confidence: 99%

Semantic language models with deep neural networks

Bayer

Riccardi

2016

Computer Speech & Language

View full text Add to dashboard Cite

Spoken language systems (SLS) communicate with users in natural language through speech. There are two main problems related to processing the spoken input in SLS. The first one is automatic speech recognition (ASR) which recognizes what the user says. The second one is spoken language understanding (SLU) which understands what the user means. We focus on the language model (LM) component of SLS. LMs constrain the search space that is used in the search for the best hypothesis. Therefore, they play a crucial role in the performance of SLS.It has long been discussed that an improvement in the recognition performance does not necessarily yield a better understanding performance. Therefore, optimization of LMs for the understanding performance is crucial. In addition, long-range dependencies in languages are hard to handle with statistical language models. These two problems are addressed in this thesis.We investigate two different LM structures. The first LM that we investigate enable SLS to understand better what they recognize by searching the ASR hypotheses for the best understanding performance. We refer to these models as joint LMs. They use lexical and semantic units jointly in the LM. The second LM structure uses the semantic context of an utterance, which can also be described as "what the system understands", to search for a better hypothesis that improves the recognition and the understanding performance. We refer to these models as semantic LMs (SELMs). SELMs use features that are based on a well established theory of lexical semantics, namely the theory of frame semantics. They incorporate the semantic features which are extracted from the ASR hypothesis into the LM and handle long-range dependencies by using the semantic relationships between words and semantic context. ASR noise is propagated to the semantic features, to suppress this noise we introduce the use of deep semantic encodings for semantic feature extraction. In this way, SELMs optimize both the recognition and the understanding performance.

show abstract

“…Fluency Features These features measure the 'fluency' of the target sentence and are based on different language models: a 'traditional' 4-gram language model estimated on WMT monolingual and bilingual data (the language model used by our system to generate the pseudo-references); a continuous-space 10-gram language model estimated with SOUL (Le et al, 2011) (also used by our MT system) and a 4-gram language model based on Part-of-Speech sequences. The latter model was estimated on the Spanish side of the bilingual data provided in the translation shared task in 2013.…”

Section: Featuresmentioning

confidence: 99%

LIMSI Submission for WMT'14 QE Task

Wisniewski

Pécheux

Allauzen

et al. 2014

Proceedings of the Ninth Workshop on Statistical Machine Translation

View full text Add to dashboard Cite

This paper describes LIMSI participation to the WMT'14 Shared Task on Quality Estimation; we took part to the wordlevel quality estimation task for English to Spanish translations. Our system relies on a random forest classifier, an ensemble method that has been shown to be very competitive for this kind of task, when only a few dense and continuous features are used. Notably, only 16 features are used in our experiments. These features describe, on the one hand, the quality of the association between the source sentence and each target word and, on the other hand, the fluency of the hypothesis. Since the evaluation criterion is the f 1 measure, a specific tuning strategy is proposed to select the optimal values for the hyper-parameters. Overall, our system achieves a 0.67 f 1 score on a randomly extracted test set.

show abstract

Structured Output Layer neural network language model

Cited by 83 publications

References 5 publications

The KIT-LIMSI Translation System for WMT 2015

The KIT-LIMSI Translation System for WMT 2015

Semantic language models with deep neural networks

LIMSI Submission for WMT'14 QE Task

Contact Info

Product

Resources

About