Fine-tuning BERT for Low-Resource Natural Language Understanding via Active Learning

Grießhaber, Daniel; Maucher, Johannes; Vu, Ngoc Thang

doi:10.18653/v1/2020.coling-main.100

Cited by 23 publications

(11 citation statements)

References 36 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…From the above, we conclude that uncertainty-based AL with BERT base can be used to decrease labeling effort. This supports what was concluded by [10].…”

Section: Discussionsupporting

confidence: 93%

Active learning for reducing labeling effort in text classification tasks

Jacobs¹,

Wenniger²,

Wiering³

et al. 2021

Preprint

View full text Add to dashboard Cite

Labeling data can be an expensive task as it is usually performed manually by domain experts. This is cumbersome for deep learning, as it is dependent on large labeled datasets. Active learning (AL) is a paradigm that aims to reduce labeling effort by only using the data which the used model deems most informative. Little research has been done on AL in a text classification setting and next to none has involved the more recent, state-of-the-art NLP models. Here, we present an empirical study that compares different uncertainty-based algorithms with BERT base as the used classifier. We evaluate the algorithms on two NLP classification datasets: Stanford Sentiment Treebank and KvK-Frontpages. Additionally, we explore heuristics that aim to solve presupposed problems of uncertainty-based AL; namely, that it is unscalable and that it is prone to selecting outliers. Furthermore, we explore the influence of the querypool size on the performance of AL. Whereas it was found that the proposed heuristics for AL did not improve performance of AL; our results show that using uncertainty-based AL with BERT base outperforms random sampling of data. This difference in performance can decrease as the query-pool size gets larger.

show abstract

“…From the above, we conclude that uncertainty-based AL with BERT base can be used to decrease labeling effort. This supports what was concluded by [10].…”

Section: Discussionsupporting

confidence: 93%

Active learning for reducing labeling effort in text classification tasks

Jacobs¹,

Wenniger²,

Wiering³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Therefore, we systematically study advanced layer-specific adaptation techniques previously studied in the general domains: freezing pretrained parameters in the lower layers (Grießhaber et al, 2020), adopting layerwise learning-rate decay (Clark et al, 2020), and reinitializing parameters in the top layer (Zhang et al, 2021). See Figure 1.…”

Section: Fine-tuning Stabilitymentioning

confidence: 99%

“…By pretraining on unlabeled text, large neural language models facilitate transfer learning and have demonstrated spectacular success for a wide range of NLP applications (Devlin et al, 2019;Liu et al, 2019). Fine-tuning these large neural models for specific tasks, however, remains challenging, as has been shown in the general domain (Grießhaber et al, 2020;Mosbach et al, 2021;Zhang et al, 2021). For biomedicine, the challenge is further exacerbated by the scarcity of task-specific training data because annotation requires domain expertise and crowd-sourcing is harder to apply.…”

Section: Introductionmentioning

confidence: 99%

Fine-Tuning Large Neural Language Models for Biomedical Natural Language Processing

Tinn¹,

Cheng²,

Gu³

et al. 2021

Preprint

View full text Add to dashboard Cite

Motivation: A perennial challenge for biomedical researchers and clinical practitioners is to stay abreast with the rapid growth of publications and medical notes. Natural language processing (NLP) has emerged as a promising direction for taming information overload. In particular, large neural language models facilitate transfer learning by pretraining on unlabeled text, as exemplified by the successes of BERT models in various NLP applications. However, fine-tuning such models for an end task remains challenging, especially with small labeled datasets, which are common in biomedical NLP. Results:We conduct a systematic study on fine-tuning stability in biomedical NLP. We show that finetuning performance may be sensitive to pretraining settings, especially in low-resource domains. Large models have potential to attain better performance, but increasing model size also exacerbates finetuning instability. We thus conduct a comprehensive exploration of techniques for addressing fine-tuning instability. We show that these techniques can substantially improve fine-tuning performance for lowresource biomedical NLP applications. Specifically, freezing lower layers is helpful for standard BERT-BASE models, while layerwise decay is more effective for BERT-LARGE and ELECTRA models. For low-resource text similarity tasks such as BIOSSES, reinitializing the top layer is the optimal strategy. Overall, domainspecific vocabulary and pretraining facilitate more robust models for fine-tuning. Based on these findings, we establish new state of the art on a wide range of biomedical NLP applications. Availability and implementation: To facilitate progress in biomedical NLP, we release our state-of-the-art pretrained and fine-tuned models: https://aka.ms/BLURB.

show abstract

“…A linear model is added to the embedding output to predict the score for the labels. Previous research has established that active learning can increase the performance of Transformer-based text classifiers (Grießhaber et al, 2020). With the second option, the system uses the same classification outputs but unlabelled instances are taken from each class in equal amounts.…”

Section: Active Learningmentioning

confidence: 99%

Paladin: an annotation tool based on active and proactive learning

Nghiem¹,

Baylis²,

Ananiadou³

2021

Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrati

View full text Add to dashboard Cite

In this paper, we present Paladin, an opensource web-based annotation tool for creating high-quality multi-label document-level datasets. By integrating active learning and proactive learning to the annotation task, Paladin makes the task less time-consuming and requiring less human effort. Although Paladin is designed for multi-label settings, the system is flexible and can be adapted to other tasks in single-label settings.

show abstract

Fine-tuning BERT for Low-Resource Natural Language Understanding via Active Learning

Cited by 23 publications

References 36 publications

Active learning for reducing labeling effort in text classification tasks

Active learning for reducing labeling effort in text classification tasks

Fine-Tuning Large Neural Language Models for Biomedical Natural Language Processing

Paladin: an annotation tool based on active and proactive learning

Contact Info

Product

Resources

About