Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics 2020
DOI: 10.18653/v1/2020.acl-main.592
|View full text |Cite
|
Sign up to set email alerts
|

Learning Architectures from an Extended Search Space for Language Modeling

Abstract: Neural architecture search (NAS) has advanced significantly in recent years but most NAS systems restrict search to learning architectures of a recurrent or convolutional cell. In this paper, we extend the search space of NAS. In particular, we present a general approach to learn both intra-cell and inter-cell architectures (call it ESS). For a better search result, we design a joint learning method to perform intra-cell and inter-cell NAS simultaneously. We implement our model in a differentiable architecture… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
10
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 13 publications
(10 citation statements)
references
References 23 publications
0
10
0
Order By: Relevance
“…A range of models have been developed, with progressively larger models trained on more data (e.g., Dai et al, 2019). Variations of the LSTM have consistently achieved state-of-the-art performance without massive compute resources (Merity et al, 2018a;Melis et al, 2019;Merity, 2019;Li et al, 2020). We use the AWD-LSTM (Merity et al, 2018b) in our experiments, as it achieves very strong performance, has a well-studied codebase, and can be trained on a single GPU in a day.…”
Section: Personalization the Closest Work Ismentioning
confidence: 99%
“…A range of models have been developed, with progressively larger models trained on more data (e.g., Dai et al, 2019). Variations of the LSTM have consistently achieved state-of-the-art performance without massive compute resources (Merity et al, 2018a;Melis et al, 2019;Merity, 2019;Li et al, 2020). We use the AWD-LSTM (Merity et al, 2018b) in our experiments, as it achieves very strong performance, has a well-studied codebase, and can be trained on a single GPU in a day.…”
Section: Personalization the Closest Work Ismentioning
confidence: 99%
“…We focus on a single model type for computational budget reasons. We chose an LSTM because while transformer based models such as GPT-2 now dominate transfer learning, LSTMs continue to be competitive in language modeling (Du et al, 2020;Li et al, 2020;Melis et al, 2018;Merity et al, 2017a). Our ideas are orthogonal to this prior work and our findings may apply to transformers as well, but confirming that would require additional experiments.…”
Section: Related Workmentioning
confidence: 99%
“…NAS methods have shown strong performance on many NLP and CV tasks, such as language model-ing and image classification (Zoph and Le, 2017;Pham et al, 2018;Luo et al, 2018;Liu et al, 2019). Applications in NLP, such as NER (Jiang et al, 2019;Li et al, 2020), translation (So et al, 2019), text classification (Wang et al, 2020), and natural language inference (NLI) (Pasunuru and Bansal, 2019;Wang et al, 2020) have also been explored.…”
Section: Related Workmentioning
confidence: 99%
“…Current SOTA approaches focus on learning new cell architectures as replacements for LSTM or convolutional cells (Zoph and Le, 2017;Pham et al, 2018;Liu et al, 2019;Jiang et al, 2019;Li et al, 2020) or entire model architectures to replace hand-designed models such as the transformer or DenseNet (So et al, 2019;Pham et al, 2018).…”
Section: Related Workmentioning
confidence: 99%