Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2020
DOI: 10.18653/v1/2020.emnlp-main.394
|View full text |Cite
|
Sign up to set email alerts
|

An Empirical Investigation Towards Efficient Multi-Domain Language Model Pre-training

Abstract: Pre-training large language models has become a standard in the natural language processing community. Such models are pretrained on generic data (e.g. BookCorpus and English Wikipedia) and often fine-tuned on tasks in the same domain. However, in order to achieve state-of-the-art performance on out of domain tasks such as clinical named entity recognition and relation extraction, additional in domain pre-training is required. In practice, staged multi-domain pre-training presents performance deterioration in … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
8
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 11 publications
(9 citation statements)
references
References 29 publications
1
8
0
Order By: Relevance
“…The results are shown in Table 5. Among the sentence-wise and the token-wise classification variants, the SQP(EWC+LRD) gives the best results considering exact match, ROUGE-L F1 and S+WMS scores, while the SQP(SLR) and the SQP(FT RB) variants perform the poorest among the lot, which is consistent with the results in Arumae et al (2020). It only produces short answers, hence have a high precision but is poor on all other counts.…”
Section: Resultssupporting
confidence: 82%
See 2 more Smart Citations
“…The results are shown in Table 5. Among the sentence-wise and the token-wise classification variants, the SQP(EWC+LRD) gives the best results considering exact match, ROUGE-L F1 and S+WMS scores, while the SQP(SLR) and the SQP(FT RB) variants perform the poorest among the lot, which is consistent with the results in Arumae et al (2020). It only produces short answers, hence have a high precision but is poor on all other counts.…”
Section: Resultssupporting
confidence: 82%
“…(a). Using a learning rate that linearly decreases by a constant factor (LRD) from one layer to the next, with the outermost language modeling head layer having the maximum learning rate, as in Arumae et al (2020). This enforces a constraint that outer layers adapt more to the E-Manual domain, while the inner layers' weights do not change much, thus restricting them to retain the knowledge of the generic domain primarily.…”
Section: Pre-training On the E-manuals Corpusmentioning
confidence: 99%
See 1 more Smart Citation
“…A number of recent works study LM adaptation to a new domain that is different from the original pretraining domain. They show performance gains on downstream tasks in the new domain (Gururangan et al, 2020;Yao et al, 2021), as well as retaining knowledge learned in general domain when certain regularizations are applied (Arumae et al, 2020). However, continual (sequential) pretraining over multiple distinct corpora is less studiedit is unclear how the LM can continually acquire and accumulate knowledge from different corpora to benefit downstream tasks in a new domain or on more recent data, and whether the LM can retain the knowledge learned from earlier corpora to preserve decent performance on seen domains.…”
Section: Introductionmentioning
confidence: 99%
“…CF mitigation is particularly important in clinical NLP given that many clinical datasets are quite different from generic domains and from each other. Arumae et al [2020] explore CF in language modeling when transferring between the generic, clinical, and biomedical domains and compare learning rate control, experience replay, and EWC.…”
Section: Discussionmentioning
confidence: 99%