Learning Architectures from an Extended Search Space for Language Modeling

Li, Yinqiao; Hu, Chi; Zhang, Yuhao; Xu, Nuo; Jiang, Yufan; Xiao, Tong; Zhu, Jun; Liu, Tongran; Li, Changliang

doi:10.18653/v1/2020.acl-main.592

Cited by 13 publications

(10 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A range of models have been developed, with progressively larger models trained on more data (e.g., Dai et al, 2019). Variations of the LSTM have consistently achieved state-of-the-art performance without massive compute resources (Merity et al, 2018a;Melis et al, 2019;Merity, 2019;Li et al, 2020). We use the AWD-LSTM (Merity et al, 2018b) in our experiments, as it achieves very strong performance, has a well-studied codebase, and can be trained on a single GPU in a day.…”

Section: Personalization the Closest Work Ismentioning

confidence: 99%

Compositional Demographic Word Embeddings

Welch

Kummerfeld

Pérez-Rosas

et al. 2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

Word embeddings are usually derived from corpora containing text from many individuals, thus leading to general purpose representations rather than individually personalized representations. While personalized embeddings can be useful to improve language model performance and other language processing tasks, they can only be computed for people with a large amount of longitudinal data, which is not the case for new users. We propose a new form of personalized word embeddings that use demographic-specific word representations derived compositionally from full or partial demographic information for a user (i.e., gender, age, location, religion). We show that the resulting demographic-aware word representations outperform generic word representations on two tasks for English: language modeling and word associations. We further explore the trade-off between the number of available attributes and their relative effectiveness and discuss the ethical implications of using them.

show abstract

Section: Personalization the Closest Work Ismentioning

confidence: 99%

Compositional Demographic Word Embeddings

Welch

Kummerfeld

Pérez-Rosas

et al. 2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

show abstract

“…We focus on a single model type for computational budget reasons. We chose an LSTM because while transformer based models such as GPT-2 now dominate transfer learning, LSTMs continue to be competitive in language modeling (Du et al, 2020;Li et al, 2020;Melis et al, 2018;Merity et al, 2017a). Our ideas are orthogonal to this prior work and our findings may apply to transformers as well, but confirming that would require additional experiments.…”

Section: Related Workmentioning

confidence: 99%

Improving Low Compute Language Modeling with In-Domain Embedding Initialisation

Welch

Mihalcea

Kummerfeld

2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

Many NLP applications, such as biomedical data and technical support, have 10-100 million tokens of in-domain data and limited computational resources for learning from it. How should we train a language model in this scenario? Most language modeling research considers either a small dataset with a closed vocabulary (like the standard 1 million token Penn Treebank), or the whole web with bytepair encoding. We show that for our target setting in English, initialising and freezing input embeddings using in-domain data can improve language model performance by providing a useful representation of rare words, and this pattern holds across several different domains. In the process, we show that the standard convention of tying input and output embeddings does not improve perplexity when initializing with embeddings trained on in-domain data.

show abstract

“…NAS methods have shown strong performance on many NLP and CV tasks, such as language model-ing and image classification (Zoph and Le, 2017;Pham et al, 2018;Luo et al, 2018;Liu et al, 2019). Applications in NLP, such as NER (Jiang et al, 2019;Li et al, 2020), translation (So et al, 2019), text classification (Wang et al, 2020), and natural language inference (NLI) (Pasunuru and Bansal, 2019;Wang et al, 2020) have also been explored.…”

Section: Related Workmentioning

confidence: 99%

“…Current SOTA approaches focus on learning new cell architectures as replacements for LSTM or convolutional cells (Zoph and Le, 2017;Pham et al, 2018;Liu et al, 2019;Jiang et al, 2019;Li et al, 2020) or entire model architectures to replace hand-designed models such as the transformer or DenseNet (So et al, 2019;Pham et al, 2018).…”

Section: Related Workmentioning

confidence: 99%

Evaluating the Effectiveness of Efficient Neural Architecture Search for Sentence-Pair Tasks

MacLaughlin¹,

Dhamala²,

Kumar³

et al. 2020

Proceedings of the First Workshop on Insights From Negative Results in NLP

View full text Add to dashboard Cite

Neural Architecture Search (NAS) methods, which automatically learn entire neural model or individual neural cell architectures, have recently achieved competitive or state-of-theart (SOTA) performance on variety of natural language processing and computer vision tasks, including language modeling, natural language inference, and image classification. In this work, we explore the applicability of a SOTA NAS algorithm, Efficient Neural Architecture Search (ENAS) (Pham et al., 2018) to two sentence pair tasks, paraphrase detection and semantic textual similarity. We use ENAS to perform a microlevel search and learn a task-optimized RNN cell architecture as a drop-in replacement for an LSTM. We explore the effectiveness of ENAS through experiments on three datasets (MRPC, SICK, STS-B), with two different models (ESIM, BiLSTM-Max), and two sets of embeddings (Glove, BERT). In contrast to prior work applying ENAS to NLP tasks, our results are mixed -we find that ENAS architectures sometimes, but not always, outperform LSTMs and perform similarly to random architecture search.

show abstract

Learning Architectures from an Extended Search Space for Language Modeling

Cited by 13 publications

References 23 publications

Compositional Demographic Word Embeddings

Compositional Demographic Word Embeddings

Improving Low Compute Language Modeling with In-Domain Embedding Initialisation

Evaluating the Effectiveness of Efficient Neural Architecture Search for Sentence-Pair Tasks

Contact Info

Product

Resources

About