Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer 2021
DOI: 10.18653/v1/2021.acl-short.89
|View full text |Cite
|
Sign up to set email alerts
|

A Simple Recipe for Multilingual Grammatical Error Correction

Abstract: This paper presents a simple recipe to train state-of-the-art multilingual Grammatical Error Correction (GEC) models. We achieve this by first proposing a language-agnostic method to generate a large number of synthetic examples. The second ingredient is to use largescale multilingual language models (up to 11B parameters). Once fine-tuned on languagespecific supervised sets we surpass the previous state-of-the-art results on GEC benchmarks in four languages: English, Czech, German and Russian. Having establis… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
102
0
1

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 73 publications
(103 citation statements)
references
References 16 publications
0
102
0
1
Order By: Relevance
“…A limitation of our work is that we had only a single moderate GPU at our disposal. Scaling model size [105], incorporating additional datasets [46], and training longer can improve accuracy by several percent. Similarly, one can build a model of multiple languages to gain benefits by overlapping vocabularies and semantics of related under-represented languages, although studies report contradictory results [48,46].…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations

Correcting diacritics and typos with a ByT5 transformer model

Stankevičius,
Lukoševičius,
Kapočiūtė-Dzikienė
et al. 2022
Preprint
“…A limitation of our work is that we had only a single moderate GPU at our disposal. Scaling model size [105], incorporating additional datasets [46], and training longer can improve accuracy by several percent. Similarly, one can build a model of multiple languages to gain benefits by overlapping vocabularies and semantics of related under-represented languages, although studies report contradictory results [48,46].…”
Section: Discussionmentioning
confidence: 99%
“…The popular seq2seq transformer T5 [99] used batch size 128 for both pre-training and fine-tuning. Follow-up models such as the multilingual version mT5 [100], the grammatical error correction model gT5 [105], and ByT5 [8] (the one we use in this work) all carried on with the same value for fine-tuning. The same size is also used in works solving the diacritics restoration task [47,106].…”
Section: Batch Sizementioning
confidence: 99%
See 1 more Smart Citation

Correcting diacritics and typos with a ByT5 transformer model

Stankevičius,
Lukoševičius,
Kapočiūtė-Dzikienė
et al. 2022
Preprint
“…Grundkiewicz et al (2019) approach GEC as a neural machine translation task using the Transformer architecture (Vaswani et al, 2017), which is pre-trained using a vast amount of synthetic data generated by character-level and word-level edits. Recently, Rothe et al (2021) presented a GEC system based on multilingual mT5 (Xue et al, 2021b), reaching state-of-the-art results on several datasets with the gigantic xxl model size with 13B parameters.…”
Section: Related Workmentioning
confidence: 99%
“…For lexical normalization, we could directly use the unnormalized sentence as input and the normalized sentence as output (an approach used by Rothe et al (2021) for GEC). However, we were concerned that such an approach would be too different from the ByT5 pre-training, and furthermore, it would not allow to reconstruct the alignment of the normalized tokens when a word is removed during normalization or split into several words.…”
Section: Input and Output Formatmentioning
confidence: 99%