Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2020
DOI: 10.18653/v1/2020.emnlp-main.696
|View full text |Cite
|
Sign up to set email alerts
|

Improving Low Compute Language Modeling with In-Domain Embedding Initialisation

Abstract: Many NLP applications, such as biomedical data and technical support, have 10-100 million tokens of in-domain data and limited computational resources for learning from it. How should we train a language model in this scenario? Most language modeling research considers either a small dataset with a closed vocabulary (like the standard 1 million token Penn Treebank), or the whole web with bytepair encoding. We show that for our target setting in English, initialising and freezing input embeddings using in-domai… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2020
2020
2020
2020

Publication Types

Select...
1
1

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 22 publications
0
2
0
Order By: Relevance
“…For the input embeddings of the newly added tokens, we adopt the approach of using the average embeddings of the subword tokens that make up these new tokens as in (Hewitt, 2021;Welch et al, 2020). This method utilizes the semantic richness of the model's existing subword embeddings to offer a meaningful starting point for the new tokens' representations.…”
Section: Preliminary 2: Subword-based Embeddings Initializationmentioning
confidence: 99%
“…For the input embeddings of the newly added tokens, we adopt the approach of using the average embeddings of the subword tokens that make up these new tokens as in (Hewitt, 2021;Welch et al, 2020). This method utilizes the semantic richness of the model's existing subword embeddings to offer a meaningful starting point for the new tokens' representations.…”
Section: Preliminary 2: Subword-based Embeddings Initializationmentioning
confidence: 99%
“…We explored various hyperparameter configurations on our validation set and found the best results using dropout with the same mask for generic and demographic-specific embeddings, untied weights, and fixed input embeddings. Untying and fixing input embeddings is supported by concurrent work (Welch et al, 2020b). Each model is trained for 50 epochs.…”
Section: Language Modelingmentioning
confidence: 99%