Proceedings of the 28th International Conference on Computational Linguistics 2020
DOI: 10.18653/v1/2020.coling-main.598
|View full text |Cite
|
Sign up to set email alerts
|

German’s Next Language Model

Abstract: In this work we present the experiments which lead to the creation of our BERT and ELECTRA based German language models, GBERT and GELECTRA. By varying the input training data, model size, and the presence of Whole Word Masking (WWM) we were able to attain SoTA performance across a set of document classification and named entity recognition (NER) tasks for both models of base and large size. We adopt an evaluation driven approach in training these models and our results indicate that both adding more data and … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
78
0
1

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
3
2

Relationship

0
10

Authors

Journals

citations
Cited by 108 publications
(98 citation statements)
references
References 12 publications
1
78
0
1
Order By: Relevance
“…BERT model for Twin network -For the BERT embedding block shown in Figure 1, we use a pretrained German BERT model, gbert-base (Chan et al, 2020;Devlin et al, 2018), fine-tuned on annotated data sampled from live traffic. For finetuning we use a dataset of 1.5 million live traffic utterances.…”
Section: Methodsmentioning
confidence: 99%
“…BERT model for Twin network -For the BERT embedding block shown in Figure 1, we use a pretrained German BERT model, gbert-base (Chan et al, 2020;Devlin et al, 2018), fine-tuned on annotated data sampled from live traffic. For finetuning we use a dataset of 1.5 million live traffic utterances.…”
Section: Methodsmentioning
confidence: 99%
“…For experiments with fine-tuning, we use language-specific BERT models 11 for German (Chan et al, 2020), Spanish (Canete et al, 2020), Dutch (de Vries et al, 2019, Finnish (Virtanen et al, 2019), Danish, 12 Croatain (Ulčar and Robnik-Šikonja, 2020), while we use mBERT (Devlin et al, 2019) for Afrikaans.…”
Section: Low Resource Settingmentioning
confidence: 99%
“…Regarding the classification, we fine-tune the pretrained gbert-base model (Chan et al, 2020), which has 110M parameters and is the best performing German transformer model for text classification at this number of parameters. While there is a larger gbert model available, we opted for the base variant due to its efficiency, which results in lower turnaround times of an AL step for the practitioner.…”
Section: Model and Trainingmentioning
confidence: 99%