German's Next Language Model

Chan, Branden; Schweter, Stefan; Möller, Timo

doi:10.48550/arxiv.2010.10906

Cited by 11 publications

(13 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In addition, we list the best results that were achieved without a Transformer architecture (Riedl and Padó 2018). While our approach outperforms the BiLSTM results of Riedl and Padó (2018), we were not able to reach the results of Chan, Schweter, and Möller (2020). One of the reasons is that our BERTs are smaller and also pre-trained on a much smaller text corpus.…”

Section: Table 10mentioning

confidence: 91%

“…This means that only either all tokens belonging to a word are masked or none of the tokens. Recent work of Chan, Schweter, and Möller (2020) and Cui et al (2019) already showed the positive effect of WWM in pre-training on the performance of the downstream task. In this article, we also examine the differences between the original MLM task and the MLM task with WWM.…”

Section: Whole-word Masking (Wwm)mentioning

confidence: 99%

“…Most papers on BERT models dealing with German NER, for example Chan, Schweter, and Möller (2020) or Labusch, Neudecker, and Zellhöfer (2019), do not focus on an investigation of different variants for fine-tuning. However, there are already studies for NER tasks in other languages (e.g.…”

Section: Wordsmentioning

confidence: 99%

“…Next, in Table 11, we compare our results for the GermEval task with the current state of the art which, to the best of our knowledge, was achieved by Chan, Schweter, and Möller (2020). In addition, we list the best results that were achieved without a Transformer architecture (Riedl and Padó 2018).…”

Section: Table 10mentioning

confidence: 99%

See 3 more Smart Citations

Optimizing small BERTs trained for German NER

Zöllner,

Sperfeld,

Wick

et al. 2021

Preprint

View full text Add to dashboard Cite

Currently, the most widespread neural network architecture for training language models is the so called BERT which led to improvements in various Natural Language Processing (NLP) tasks. In general, the larger the number of parameters in a BERT model, the better the results obtained in these NLP tasks. Unfortunately, the memory consumption and the training duration drastically increases with the size of these models, though. In this article, we investigate various training techniques of smaller BERT models and evaluate them on five public German Named Entity Recognition (NER) tasks of which two are introduced by this article. We combine different methods from other BERT variants like ALBERT, RoBERTa, and relative positional encoding. In addition, we propose two new fine-tuning techniques leading to better performance: Class-Start-End-tagging and a modified form of Linear Chain Conditional Random Fields. Furthermore, we introduce a new technique called Whole-Word Attention which reduces BERTs memory usage and leads to a small increase in performance.

show abstract

Section: Table 10mentioning

confidence: 91%

Section: Whole-word Masking (Wwm)mentioning

confidence: 99%

Section: Wordsmentioning

confidence: 99%

Section: Table 10mentioning

confidence: 99%

See 2 more Smart Citations

Optimizing small BERTs trained for German NER

Zöllner,

Sperfeld,

Wick

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…For experiments with fine-tuning, we use language-specific BERT models 11 for German (Chan et al, 2020), Spanish (Canete et al, 2020), Dutch (de Vries et al, 2019), Finnish (Virtanen et al, 2019, Danish 12 , Croatain (Ulčar and Robnik-Šikonja, 2020), while we use mBERT (Devlin et al, 2019) for Afrikaans.…”

Section: Experimental Protocolmentioning

confidence: 99%

Context-aware Adversarial Training for Name Regularity Bias in Named Entity Recognition

Ghaddar,

Langlais,

Rashid

et al. 2021

Preprint

View full text Add to dashboard Cite

In this work, we examine the ability of NER models to use contextual information when predicting the type of an ambiguous entity. We introduce NRB, a new testbed carefully designed to diagnose Name Regularity Bias of NER models. Our results indicate that all state-of-the-art models we tested show such a bias; BERT fine-tuned models significantly outperforming feature-based (LSTM-CRF) ones on NRB, despite having comparable (sometimes lower) performances on standard benchmarks.To mitigate this bias, we propose a novel model-agnostic training method which adds learnable adversarial noise to some entity mentions, thus enforcing models to focus more strongly on the contextual signal, leading to significant gains on NRB. Combining it with two other training strategies, data augmentation and parameter freezing, leads to further gains.

show abstract

Classification of reflective writing: A comparative analysis with shallow machine learning and pre-trained language models

Zhang,

Hofmann,

Plößl

et al. 2024

Educ Inf Technol

View full text Add to dashboard Cite

Reflective practice holds critical importance, for example, in higher education and teacher education, yet promoting students’ reflective skills has been a persistent challenge. The emergence of revolutionary artificial intelligence technologies, notably in machine learning and large language models, heralds potential breakthroughs in this domain. The current research on analyzing reflective writing hinges on sentence-level classification. Such an approach, however, may fall short of providing a holistic grasp of written reflection. Therefore, this study employs shallow machine learning algorithms and pre-trained language models, namely BERT, RoBERTa, BigBird, and Longformer, with the intention of enhancing the document-level classification accuracy of reflective writings. A dataset of 1,043 reflective writings was collected in a teacher education program at a German university (M = 251.38 words, SD = 143.08 words). Our findings indicated that BigBird and Longformer models significantly outperformed BERT and RoBERTa, achieving classification accuracies of 76.26% and 77.22%, respectively, with less than 60% accuracy observed in shallow machine learning models. The outcomes of this study contribute to refining document-level classification of reflective writings and have implications for augmenting automated feedback mechanisms in teacher education.

show abstract

German's Next Language Model

Cited by 11 publications

References 15 publications

Optimizing small BERTs trained for German NER

Optimizing small BERTs trained for German NER

Context-aware Adversarial Training for Name Regularity Bias in Named Entity Recognition

Classification of reflective writing: A comparative analysis with shallow machine learning and pre-trained language models

Contact Info

Product

Resources

About