2020
DOI: 10.48550/arxiv.2002.06305
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

7
186
1

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 134 publications
(194 citation statements)
references
References 0 publications
7
186
1
Order By: Relevance
“…Quantization is always applied to the median checkpoint for the respective task. We exclude the problematic WNLI task (Levesque et al, 2012), as it has relatively small dataset and shows an unstable behaviour (Dodge et al, 2020), in particular due to several issues with the way the dataset was constructed 2 .…”
Section: B Experimental Details B1 Fp32 Fine-tuning Detailsmentioning
confidence: 99%
“…Quantization is always applied to the median checkpoint for the respective task. We exclude the problematic WNLI task (Levesque et al, 2012), as it has relatively small dataset and shows an unstable behaviour (Dodge et al, 2020), in particular due to several issues with the way the dataset was constructed 2 .…”
Section: B Experimental Details B1 Fp32 Fine-tuning Detailsmentioning
confidence: 99%
“…(2) RefCOCO+ (Yu et al, 2016) (3) Robust evaluation. Previous works show that model training on limited data can suffer from instability (Dodge et al, 2020;. For a robust and comprehensive evaluation, we report mean results from 5 random training and validation set split, as well as the standard deviation.…”
Section: Experimental Settingsmentioning
confidence: 99%
“…Environmental Impact The use of large-scale Transformers requires a lot of computations and GPUs/TPUs for training, which contributes to global warming (Strubell et al, 2019;Schwartz et al, 2020). This is a smaller issue in our case, as we do not train such models from scratch; rather, we fine-tune them on relatively small datasets.…”
Section: Ethics and Broader Impactmentioning
confidence: 99%
“…Pre-trained transformers often suffer from instability of the results across multiple reruns with different random seeds. This usually happens with small training datasets (Dodge et al, 2020;Mosbach et al, 2021). In such cases, typically multiple reruns are performed, and the average value over these reruns is reported.…”
Section: E Impact Of the Random Seedmentioning
confidence: 99%