Regularizing and Optimizing LSTM Language Models

Merity, Stephen; Keskar, Nitish Shirish; Socher, Richard

doi:10.48550/arxiv.1708.02182

Cited by 166 publications

(296 citation statements)

References 26 publications

Supporting

Mentioning

288

Contrasting

Order By: Relevance

“…Instead of taking the true gradients, one can train using gradients clipped in some way. This has proven to be of use in a variety of domains [Kaiser and Sutskever, 2015, Merity et al, 2017, Gehring et al, 2017 but will not fix all problems. As before, this calculation of the gradient is biased.…”

Section: Gradient Clippingmentioning

confidence: 99%

Gradients are Not All You Need

Metz¹,

Freeman²,

Schoenholz³

et al. 2021

Preprint

View full text Add to dashboard Cite

Differentiable programming techniques are widely used in the community and are responsible for the machine learning renaissance of the past several decades. While these methods are powerful, they have limits. In this short report, we discuss a common chaos based failure mode which appears in a variety of differentiable circumstances, ranging from recurrent neural networks and numerical physics simulation to training learned optimizers. We trace this failure to the spectrum of the Jacobian of the system under study, and provide criteria for when a practitioner might expect this failure to spoil their differentiation based optimization algorithms. * Equal contribution 2 Up to numerical precision, though see [Chow and Palmer, 1992, Kachman et al., 2017] for cases where the gradients will be close.Preprint. Under review.

show abstract

Section: Gradient Clippingmentioning

confidence: 99%

Gradients are Not All You Need

Metz¹,

Freeman²,

Schoenholz³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…It employs three novel techniques for finetuning the language models for various NLP tasks, which are discriminative fine-tuning, slanted triangular learning rates (STLR) and gradual unfreezing. AWD-LSTM language model [38,39],…”

Section: Ulmfitmentioning

confidence: 99%

IIITT@Dravidian-CodeMix-FIRE2021: Transliterate or translate? Sentiment analysis of code-mixed text in Dravidian languages

Puranik¹,

Bharathi²,

B³

2021

Preprint

View full text Add to dashboard Cite

Sentiment analysis of social media posts and comments for various marketing and emotional purposes is gaining recognition. With the increasing presence of code-mixed content in various native languages, there is a need for ardent research to produce promising results. This research paper bestows a tiny contribution to this research in the form of sentiment analysis of code-mixed social media comments in the popular Dravidian languages Kannada, Tamil and Malayalam. It describes the work for the shared task conducted by Dravidian-CodeMix at FIRE 2021 by employing pre-trained models like ULMFiT and multilingual BERT fine-tuned on the code-mixed dataset, transliteration (TRAI) of the same, English translations (TRAA) of the TRAI data and the combination of all the three. The results are recorded in this research paper where the best models stood 4th, 5th and 10th ranks in the Tamil, Kannada and Malayalam tasks respectively.

show abstract

“…Our work is similar to [18] that the bias information is encoded to trie structure. However, they require additional training of RNN-T [19] and LSTM-LM [20] to utilise bias information, whereas our method requires only a list of keywords we are interested in.…”

Section: Introductionmentioning

confidence: 99%

Spell my name: keyword boosted speech recognition

Jung¹,

Kim²,

Chung³

2021

Preprint

View full text Add to dashboard Cite

Recognition of uncommon words such as names and technical terminology is important to understanding conversations in context. However, the ability to recognise such words remains a challenge in modern automatic speech recognition (ASR) systems.In this paper, we propose a simple but powerful ASR decoding method that can better recognise these uncommon keywords, which in turn enables better readability of the results. The method boosts the probabilities of given keywords in a beam search based on acoustic model predictions. The method does not require any training in advance.We demonstrate the effectiveness of our method on the LibriSpeeech test sets and also internal data of real-world conversations. Our method significantly boosts keyword accuracy on the test sets, while maintaining the accuracy of the other words, and as well as providing significant qualitative improvements. This method is applicable to other tasks such as machine translation, or wherever unseen and difficult keywords need to be recognised in beam search.

show abstract

Regularizing and Optimizing LSTM Language Models

Cited by 166 publications

References 26 publications

Gradients are Not All You Need

Gradients are Not All You Need

IIITT@Dravidian-CodeMix-FIRE2021: Transliterate or translate? Sentiment analysis of code-mixed text in Dravidian languages

Spell my name: keyword boosted speech recognition

Contact Info

Product

Resources

About