2022
DOI: 10.48550/arxiv.2203.13064
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Ensembling and Knowledge Distilling of Large Sequence Taggers for Grammatical Error Correction

Abstract: In this paper, we investigate improvements to the GEC sequence tagging architecture with a focus on ensembling of recent cutting-edge Transformer-based encoders in Large configurations. We encourage ensembling models by majority votes on span-level edits because this approach is tolerant to the model architecture and vocabulary size. Our best ensemble achieves a new SOTA result with an F 0.5 score of 76.05 on BEA-2019 (test), even without pretraining on synthetic datasets. In addition, we perform knowledge dis… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 14 publications
0
1
0
Order By: Relevance
“…NMT approaches, which achieve state-of-the-art results, are encoder-decoder methods where encoders and decoders could be of different possible architectures such as RNNS , CNNS (Gehring et al, 2016), or Transformers (Vaswani et al, 2017), which were applied successfully on the GEC task (Yuan and Briscoe, 2016;Yuan et al, 2019;Junczys-Dowmunt et al, 2018). Recent approaches, utilize pre-trained large language models and achieve state-of-the-art results (Rothe et al, 2021;Tarnavskyi et al, 2022) by only fine-tuning them, solving the data bottleneck requirement for large networks.…”
Section: Approachesmentioning
confidence: 99%
“…NMT approaches, which achieve state-of-the-art results, are encoder-decoder methods where encoders and decoders could be of different possible architectures such as RNNS , CNNS (Gehring et al, 2016), or Transformers (Vaswani et al, 2017), which were applied successfully on the GEC task (Yuan and Briscoe, 2016;Yuan et al, 2019;Junczys-Dowmunt et al, 2018). Recent approaches, utilize pre-trained large language models and achieve state-of-the-art results (Rothe et al, 2021;Tarnavskyi et al, 2022) by only fine-tuning them, solving the data bottleneck requirement for large networks.…”
Section: Approachesmentioning
confidence: 99%