Encoder-Decoder Models Can Benefit from Pre-trained Masked Language Models in Grammatical Error Correction

Kaneko, Masahiro; Mita, Masato; Kiyono, Satoshi; Suzuki, Jun; Inui, Kentaro

doi:10.18653/v1/2020.acl-main.391

Cited by 85 publications

(39 citation statements)

References 31 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Recently, several methods that incorporate pretrained masked language models such as BERT, XLNet (Yang et al, 2019), and RoBERTa into EncDec based GEC have been proposed and achieved remarkable results (Kaneko et al, 2020;Omelianchuk et al, 2020). These approaches modify the model architecture and do not directly compete with the data-driven approaches discussed in this study.…”

Section: Discussionmentioning

confidence: 97%

“…Since the reference of BEA-test is publicly unavailable, 13 See Appendix F for an ablation study of SED. 14 Improved results on the CoNLL-2014 and BEA-2019 have been appeared in arXiv less than 3 months before our submission (Kaneko et al, 2020;Omelianchuk et al, 2020) that are considered contemporaneous to our submission. More detailed experimental results, including a comparison with them, are presented in Appendix E for reference.…”

Section: Comparison With Existing Modelsmentioning

confidence: 92%

“…Comparison with existing models: a bold value denotes the best result within the column. SR and BEA indicate SR BEA+EF+L8 and BEA-test, respectively Kaneko et al (2020). andOmelianchuk et al (2020) have appeared on arXiv less than 3 months before our submission and are considered contemporaneous to our submission.…”

mentioning

confidence: 98%

See 2 more Smart Citations

A Self-Refinement Strategy for Noise Reduction in Grammatical Error Correction

Mita

Kiyono

Kaneko³

et al. 2020

Findings of the Association for Computational Linguistics: EMNLP 2020

Self Cite

View full text Add to dashboard Cite

Existing approaches for grammatical error correction (GEC) largely rely on supervised learning with manually created GEC datasets. However, there has been little focus on verifying and ensuring the quality of the datasets, and on how lower-quality data might affect GEC performance. We indeed found that there is a non-negligible amount of "noise" where errors were inappropriately edited or left uncorrected. To address this, we designed a self-refinement method where the key idea is to denoise these datasets by leveraging the prediction consistency of existing models, and outperformed strong denoising baseline methods. We further applied task-specific techniques and achieved state-of-the-art performance on the CoNLL-2014, JFLEG, and BEA-2019 benchmarks. We then analyzed the effect of the proposed denoising method, and found that our approach leads to improved coverage of corrections and facilitated fluency edits which are reflected in higher recall and overall performance.

show abstract

Section: Discussionmentioning

confidence: 97%

Section: Comparison With Existing Modelsmentioning

confidence: 92%

See 1 more Smart Citation

A Self-Refinement Strategy for Noise Reduction in Grammatical Error Correction

Mita

Kiyono

Kaneko³

et al. 2020

Findings of the Association for Computational Linguistics: EMNLP 2020

Self Cite

View full text Add to dashboard Cite

show abstract

“…Zhao et al (2019) further applied copying mechanism (Gu et al, 2016;Jia and Liang, 2016) to Transformer. Considering the tremendous performance of pre-trained methods, pretrained language model, such as BERT (Devlin et al, 2019), can be incorporated into the encoder-decoder model (Kaneko et al, 2020).…”

Section: Related Workmentioning

confidence: 99%

Improving Grammatical Error Correction with Data Augmentation by Editing Latent Representation

Wan

Wang³

2020

Proceedings of the 28th International Conference on Computational Linguistics

View full text Add to dashboard Cite

The incorporation of data augmentation method in grammatical error correction task has attracted much attention. However, existing data augmentation methods mainly apply noise to tokens, which leads to the lack of diversity of generated errors. In view of this, we propose a new data augmentation method that can apply noise to the latent representation of a sentence. By editing the latent representations of grammatical sentences, we can generate synthetic samples with various error types. Combining with some pre-defined rules, our method can greatly improve the performance and robustness of existing grammatical error correction models. We evaluate our method on public benchmarks of GEC task and it achieves the state-of-the-art performance on CoNLL-2014 and FCE benchmarks.

show abstract

“…However, currently existing GEC models do not consider the generation of multiple correction candidates. Generally, in GEC, the method for obtaining multiple corrections involves the use of a plain beam search to generate the n-best candidates (Grundkiewicz et al, 2019;Kaneko et al, 2020). However, it has been shown that a plain beam search does not provide a great enough variety of candidates and produces lists of nearly identical sequences (Vijayakumar et al, 2018).…”

Section: Introductionmentioning

confidence: 99%

Generating Diverse Corrections with Local Beam Search for Grammatical Error Correction

Hotate

Kaneko

Komachi

2020

Proceedings of the 28th International Conference on Computational Linguistics

Self Cite

View full text Add to dashboard Cite

In this study, we propose a beam search method to obtain diverse outputs in a local sequence transduction task where most of the tokens in the source and target sentences overlap, such as in grammatical error correction (GEC). In GEC, it is advisable to rewrite only the local sequences that must be rewritten while leaving the correct sequences unchanged. However, existing methods of acquiring various outputs focus on revising all tokens of a sentence. Therefore, existing methods may either generate ungrammatical sentences because they force the entire sentence to be changed or produce non-diversified sentences by weakening the constraints to avoid generating ungrammatical sentences. Considering these issues, we propose a method that does not rewrite all the tokens in a text, but only rewrites those parts that need to be diversely corrected. Our beam search method adjusts the search token in the beam according to the probability that the prediction is copied from the source sentence. The experimental results show that our proposed method generates more diverse corrections than existing methods without losing accuracy in the GEC task.

show abstract

Encoder-Decoder Models Can Benefit from Pre-trained Masked Language Models in Grammatical Error Correction

Cited by 85 publications

References 31 publications

A Self-Refinement Strategy for Noise Reduction in Grammatical Error Correction

A Self-Refinement Strategy for Noise Reduction in Grammatical Error Correction

Improving Grammatical Error Correction with Data Augmentation by Editing Latent Representation

Generating Diverse Corrections with Local Beam Search for Grammatical Error Correction

Contact Info

Product

Resources

About