Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications 2019
DOI: 10.18653/v1/w19-4422
|View full text |Cite
|
Sign up to set email alerts
|

TMU Transformer System Using BERT for Re-ranking at BEA 2019 Grammatical Error Correction on Restricted Track

Abstract: We introduce our system that is submitted to the restricted track of the BEA 2019 shared task on grammatical error correction 1 (GEC). It is essential to select an appropriate hypothesis sentence from the candidates list generated by the GEC model. A re-ranker can evaluate the naturalness of a corrected sentence using language models trained on large corpora. On the other hand, these language models and language representations do not explicitly take into account the grammatical errors written by learners. Thu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
11
0
1

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 13 publications
(12 citation statements)
references
References 18 publications
0
11
0
1
Order By: Relevance
“…W ith the proposal of Transformer [52], many NM T based GEC models replaced traditional RNN based encoder -decoder with Transformer [53], [54], [55], [56], [57], [58], [59], [58], [60], [61]. Transformer first encodes the source sentence into a hidden state through a stack of several identical blocks, each consisting of a multi-head self-attention layer and a forward layer.…”
Section: Development Of Nm T Based Approachesmentioning
confidence: 99%
See 3 more Smart Citations
“…W ith the proposal of Transformer [52], many NM T based GEC models replaced traditional RNN based encoder -decoder with Transformer [53], [54], [55], [56], [57], [58], [59], [58], [60], [61]. Transformer first encodes the source sentence into a hidden state through a stack of several identical blocks, each consisting of a multi-head self-attention layer and a forward layer.…”
Section: Development Of Nm T Based Approachesmentioning
confidence: 99%
“…The score is calculated by summing all the n-gram log possibilities together and then normalize it by the length of hypothesis. Many systems train a 5-gram language model [50], [103], [61]. Besides, the mask ed language model probabilities computed by BERT can also be used for reranking [66].…”
Section: Featuresmentioning
confidence: 99%
See 2 more Smart Citations
“…Leshem et al [9] provided another measurement for meaning preservation using a semantic annotation scheme. Large-scale pre-trained models such as BERT [10] brings opportunities to improve the performance of GEC, demonstrating its effectiveness in context learning and enabling better quality estimation [11]. In terms of natural language generation tasks, the interaction between multiple hypotheses has an important impact on quality estimation [12], but the existing quality estimation of GEC outputs do not consider the interaction among hypotheses.…”
Section: Introductionmentioning
confidence: 99%