Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics 2020
DOI: 10.18653/v1/2020.acl-main.391
|View full text |Cite
|
Sign up to set email alerts
|

Encoder-Decoder Models Can Benefit from Pre-trained Masked Language Models in Grammatical Error Correction

Abstract: This paper investigates how to effectively incorporate a pre-trained masked language model (MLM), such as BERT, into an encoderdecoder (EncDec) model for grammatical error correction (GEC). The answer to this question is not as straightforward as one might expect because the previous common methods for incorporating a MLM into an EncDec model have potential drawbacks when applied to GEC. For example, the distribution of the inputs to a GEC model can be considerably different (erroneous, clumsy, etc.) from that… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
39
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
3

Relationship

2
6

Authors

Journals

citations
Cited by 85 publications
(39 citation statements)
references
References 31 publications
0
39
0
Order By: Relevance
“…Recently, several methods that incorporate pretrained masked language models such as BERT, XLNet (Yang et al, 2019), and RoBERTa into EncDec based GEC have been proposed and achieved remarkable results (Kaneko et al, 2020;Omelianchuk et al, 2020). These approaches modify the model architecture and do not directly compete with the data-driven approaches discussed in this study.…”
Section: Discussionmentioning
confidence: 97%
See 2 more Smart Citations
“…Recently, several methods that incorporate pretrained masked language models such as BERT, XLNet (Yang et al, 2019), and RoBERTa into EncDec based GEC have been proposed and achieved remarkable results (Kaneko et al, 2020;Omelianchuk et al, 2020). These approaches modify the model architecture and do not directly compete with the data-driven approaches discussed in this study.…”
Section: Discussionmentioning
confidence: 97%
“…Since the reference of BEA-test is publicly unavailable, 13 See Appendix F for an ablation study of SED. 14 Improved results on the CoNLL-2014 and BEA-2019 have been appeared in arXiv less than 3 months before our submission (Kaneko et al, 2020;Omelianchuk et al, 2020) that are considered contemporaneous to our submission. More detailed experimental results, including a comparison with them, are presented in Appendix E for reference.…”
Section: Comparison With Existing Modelsmentioning
confidence: 92%
See 1 more Smart Citation
“…Zhao et al (2019) further applied copying mechanism (Gu et al, 2016;Jia and Liang, 2016) to Transformer. Considering the tremendous performance of pre-trained methods, pretrained language model, such as BERT (Devlin et al, 2019), can be incorporated into the encoder-decoder model (Kaneko et al, 2020).…”
Section: Related Workmentioning
confidence: 99%
“…However, currently existing GEC models do not consider the generation of multiple correction candidates. Generally, in GEC, the method for obtaining multiple corrections involves the use of a plain beam search to generate the n-best candidates (Grundkiewicz et al, 2019;Kaneko et al, 2020). However, it has been shown that a plain beam search does not provide a great enough variety of candidates and produces lists of nearly identical sequences (Vijayakumar et al, 2018).…”
Section: Introductionmentioning
confidence: 99%