2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2011
DOI: 10.1109/icassp.2011.5947610
|View full text |Cite
|
Sign up to set email alerts
|

Structured Output Layer neural network language model

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
64
0
1

Year Published

2012
2012
2024
2024

Publication Types

Select...
5
3
1

Relationship

2
7

Authors

Journals

citations
Cited by 83 publications
(65 citation statements)
references
References 5 publications
0
64
0
1
Order By: Relevance
“…Various workarounds have been proposed, relying for instance on a structured output layer using word-classes (Mnih and Hinton, 2008;Le et al, 2011). A different alternative, which however only delivers quasi-normalized scores, is to train the network using the Noise Contrastive Estimation or NCE for short (Gutmann and Hyvärinen, 2010;Mnih and Teh, 2012).…”
Section: Neural Architecturesmentioning
confidence: 99%
“…Various workarounds have been proposed, relying for instance on a structured output layer using word-classes (Mnih and Hinton, 2008;Le et al, 2011). A different alternative, which however only delivers quasi-normalized scores, is to train the network using the Noise Contrastive Estimation or NCE for short (Gutmann and Hyvärinen, 2010;Mnih and Teh, 2012).…”
Section: Neural Architecturesmentioning
confidence: 99%
“…As the vocabulary size increases the size of the weight matrix between the hidden layer and the output layer becomes the dominant factor in the complexity of training. There are strategies like using a shortlist of words [115] or a hierarchical representation of words [101,100,83,84] that reduce this complexity. In this thesis, we use the class-based RNNLM architecture that is introduced in [97].…”
Section: Class-based Rnnlmsmentioning
confidence: 99%
“…The hierarchical NNLM [101,100] adopts a binary clustering of the words at the output layer to reduce the computational complexity. Structured output layer NNLMs [83,84] use another tree representation at the output layer. In this approach, all words except a shortlist of words are clustered based on the distributed representations learned at the projection layer.…”
Section: Neural Network Lmsmentioning
confidence: 99%
“…Fluency Features These features measure the 'fluency' of the target sentence and are based on different language models: a 'traditional' 4-gram language model estimated on WMT monolingual and bilingual data (the language model used by our system to generate the pseudo-references); a continuous-space 10-gram language model estimated with SOUL (Le et al, 2011) (also used by our MT system) and a 4-gram language model based on Part-of-Speech sequences. The latter model was estimated on the Spanish side of the bilingual data provided in the translation shared task in 2013.…”
Section: Featuresmentioning
confidence: 99%