Scaling recurrent neural network language models

Williams, Will; Prasad, Niranjani; Mrva, David; Ash, Tom; Robinson, Tony

doi:10.1109/icassp.2015.7179001

Cited by 50 publications

(41 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The neural language model is an LSTM RNN with 2 layers and 1000 hidden states. It has a target vocabulary of 100K and is trained using noise-contrastive estimation (Mnih and Teh, 2012;Vaswani et al, 2013;Baltescu and Blunsom, 2015;Williams et al, 2015). Additionally, it is trained using dropout with a dropout probability of 0.2 as suggested by Zaremba et al (2014).…”

Section: Re-scoring Resultsmentioning

confidence: 99%

Transfer Learning for Low-Resource Neural Machine Translation

Zoph¹,

Yüret

May³

et al. 2016

Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

605

538

View full text Add to dashboard Cite

The encoder-decoder framework for neural machine translation (NMT) has been shown effective in large data scenarios, but is much less effective for low-resource languages. We present a transfer learning method that significantly improves BLEU scores across a range of low-resource languages. Our key idea is to first train a high-resource language pair (the parent model), then transfer some of the learned parameters to the low-resource pair (the child model) to initialize and constrain training. Using our transfer learning method we improve baseline NMT models by an average of 5.6 BLEU on four low-resource language pairs. Ensembling and unknown word replacement add another 2 BLEU which brings the NMT performance on low-resource machine translation close to a strong syntax based machine translation (SBMT) system, exceeding its performance on one language pair. Additionally, using the transfer learning model for re-scoring, we can improve the SBMT system by an average of 1.3 BLEU, improving the state-of-the-art on low-resource machine translation.

show abstract

Section: Re-scoring Resultsmentioning

confidence: 99%

Transfer Learning for Low-Resource Neural Machine Translation

Zoph¹,

Yüret

May³

et al. 2016

Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

605

538

View full text Add to dashboard Cite

show abstract

“…Morin and Bengio (2005) propose hierarchical softmax, where at each step log 2 V binary classifications are performed instead of a single classification on a large number of classes. Gutmann and Hyvärinen (2010) propose noise-contrastive estimation which discriminate between positive labels and k (k << V ) negative labels sampled from a distribution, and is applied successfully on natural language processing tasks (Mnih and Teh, 2012;Vaswani et al, 2013;Williams et al, 2015;Zoph et al, 2016). Although these two approaches provide good speedups for training, they still suffer at test time.…”

Section: Introductionmentioning

confidence: 99%

Speeding Up Neural Machine Translation Decoding by Shrinking Run-time Vocabulary

Shi¹,

Knight²

2017

Proceedings of the 55th Annual Meeting of the Association For Computational Linguistics (Volume 2: Short Papers)

View full text Add to dashboard Cite

We speed up Neural Machine Translation (NMT) decoding by shrinking run-time target vocabulary. We experiment with two shrinking approaches: Locality Sensitive Hashing (LSH) and word alignments. Using the latter method, we get a 2x overall speed-up over a highly-optimized GPU implementation, without hurting BLEU. On certain low-resource language pairs, the same methods improve BLEU by 0.5 points. We also report a negative result for LSH on GPUs, due to relatively large overhead, though it was successful on CPUs. Compared with Locality Sensitive Hashing (LSH), decoding with word alignments is GPU-friendly, orthogonal to existing speedup methods and more robust across language pairs.

show abstract

“…We thus combined our recently released "Eesen" ASR toolkit [21] with an Open Source "transcriber" [22] and segmentation [23] framework, and added several additional functionality, releasing the system with English acoustic models trained on TEDLIUM data [24], and a large general purpose language model [25].…”

Section: The "Eesen Transcriber" Vmmentioning

confidence: 99%

“…The "Eesen Transcriber" is currently distributed with a TEDLIUM version 2 [24] system, which delivers a WER of less than 15% on the development set, using the pruned Cantab LM [25]. It decodes about four times faster than real-time on a single processor, and can be retrained from scratch within 6 days on a single GPU.…”

Section: Performance and Characteristicsmentioning

confidence: 99%

Virtual Machines and Containers as a Platform for Experimentation

et al. 2016

View full text Add to dashboard Cite

Research on computational speech processing has traditionally relied on the availability of a relatively large and complex infrastructure, which encompasses data (text and audio), tools (feature extraction, model training, scoring, possibly on-line and off-line, etc.), glue code, and computing. Traditionally, it has been very hard to move experiments from one site to another, and to replicate experiments. With the increasing availability of shared platforms such as commercial cloud computing platforms or publicly funded super-computing centers, there is a need and an opportunity to abstract the experimental environment from the hardware, and distribute complete setups as a virtual machine, a container, or some other shareable resource, that can be deployed and worked with anywhere.In this paper, we discuss our experience with this concept and present some tools that the community might find useful. We outline, as a case study, how such tools can be applied to a naturalistic language acquisition audio corpus.

show abstract

Scaling recurrent neural network language models

Cited by 50 publications

References 11 publications

Transfer Learning for Low-Resource Neural Machine Translation

Transfer Learning for Low-Resource Neural Machine Translation

Speeding Up Neural Machine Translation Decoding by Shrinking Run-time Vocabulary

Virtual Machines and Containers as a Platform for Experimentation

Contact Info

Product

Resources

About