Proceedings of the 28th International Conference on Computational Linguistics 2020
DOI: 10.18653/v1/2020.coling-main.100
|View full text |Cite
|
Sign up to set email alerts
|

Fine-tuning BERT for Low-Resource Natural Language Understanding via Active Learning

Abstract: Recently, leveraging pre-trained Transformer based language models in down stream, task specific models has advanced state of the art results in natural language understanding tasks. However, only a little research has explored the suitability of this approach in low resource settings with less than 1,000 training data points. In this work, we explore fine-tuning methods of BERT -a pre-trained Transformer based language model -by utilizing pool-based active learning to speed up training while keeping the cost … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
10
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 23 publications
(11 citation statements)
references
References 36 publications
1
10
0
Order By: Relevance
“…From the above, we conclude that uncertainty-based AL with BERT base can be used to decrease labeling effort. This supports what was concluded by [10].…”
Section: Discussionsupporting
confidence: 93%
“…From the above, we conclude that uncertainty-based AL with BERT base can be used to decrease labeling effort. This supports what was concluded by [10].…”
Section: Discussionsupporting
confidence: 93%
“…Therefore, we systematically study advanced layer-specific adaptation techniques previously studied in the general domains: freezing pretrained parameters in the lower layers (Grießhaber et al, 2020), adopting layerwise learning-rate decay (Clark et al, 2020), and reinitializing parameters in the top layer (Zhang et al, 2021). See Figure 1.…”
Section: Fine-tuning Stabilitymentioning
confidence: 99%
“…By pretraining on unlabeled text, large neural language models facilitate transfer learning and have demonstrated spectacular success for a wide range of NLP applications (Devlin et al, 2019;Liu et al, 2019). Fine-tuning these large neural models for specific tasks, however, remains challenging, as has been shown in the general domain (Grießhaber et al, 2020;Mosbach et al, 2021;Zhang et al, 2021). For biomedicine, the challenge is further exacerbated by the scarcity of task-specific training data because annotation requires domain expertise and crowd-sourcing is harder to apply.…”
Section: Introductionmentioning
confidence: 99%
“…A linear model is added to the embedding output to predict the score for the labels. Previous research has established that active learning can increase the performance of Transformer-based text classifiers (Grießhaber et al, 2020). With the second option, the system uses the same classification outputs but unlabelled instances are taken from each class in equal amounts.…”
Section: Active Learningmentioning
confidence: 99%