2020 International Joint Conference on Neural Networks (IJCNN) 2020
DOI: 10.1109/ijcnn48605.2020.9206891
|View full text |Cite
|
Sign up to set email alerts
|

Forget Me Not: Reducing Catastrophic Forgetting for Domain Adaptation in Reading Comprehension

Abstract: The creation of large-scale open domain reading comprehension data sets in recent years has enabled the development of end-to-end neural comprehension models with promising results. To use these models for domains with limited training data, one of the most effective approach is to first pretrain them on large out-of-domain source data and then finetune them with the limited target data. The caveat of this is that after fine-tuning the comprehension models tend to perform poorly in the source domain, a phenome… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
16
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
5

Relationship

1
9

Authors

Journals

citations
Cited by 32 publications
(16 citation statements)
references
References 13 publications
0
16
0
Order By: Relevance
“…This may be due to a local minima problem or the catastrophic forgetting. As shown by Xu et al, catastrophic forgetting can happen during fine-tuning step, by overwriting previous knowledge of the model with new distinct knowledge, leading to a loss of information on lower layers (Xu et al, 2019). This may have occurred since the linguistic characteristics of clinical texts are very different from the Portuguese corpus used during pre-training phase of Portuguese BERT.…”
Section: Effect Of Languagementioning
confidence: 99%
“…This may be due to a local minima problem or the catastrophic forgetting. As shown by Xu et al, catastrophic forgetting can happen during fine-tuning step, by overwriting previous knowledge of the model with new distinct knowledge, leading to a loss of information on lower layers (Xu et al, 2019). This may have occurred since the linguistic characteristics of clinical texts are very different from the Portuguese corpus used during pre-training phase of Portuguese BERT.…”
Section: Effect Of Languagementioning
confidence: 99%
“…For the source performance, we see a substantial drop (20-40 accuracy points) after cross-lingual transfer (e.g. "XLMR+source" vs. "MF-XLMR+ST"), implying there is catastrophic forgetting [9,12,32,35] -the phenomenon where adapted neural models "forget" and perform poorly in the original domain/task. When we incorporate gold labels in the source domain in the self-training loop ("MF-multiBERT+ST+GL" or "MF-XLMR+ST+GL"), we found a surprising observation: not only was catastrophic forgetting overcame, but the source performance actually surpasses some supervised monolingual models, e.g.…”
Section: Resultsmentioning
confidence: 92%
“…Since such changes in word probability are likely to impact the decoder and joint networks more than the encoder, a regularisation for the decoder and joint networks may also be required during fine-tuning. We experiment with EWC [17,20] for this purpose, with its loss function formulated as:…”
Section: Elastic Weight Consolidationmentioning
confidence: 99%