A Nested Attention Neural Hybrid Model for Grammatical Error Correction

Ji, Jianshu; Wang, Qinlong; Toutanova, Kristina; Gong, Yongen; Truong, Steven Q. H.; Gao, Jianfeng

doi:10.18653/v1/p17-1070

Cited by 91 publications

(90 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…(Yannakoudakis et al, 2017) developed a neural sequence-labeling model for error detection to calculate the probability of each token in a sentence as being correct or incorrect, and then use the error detecting model's result as a feature to re-rank the N best hypotheses. (Ji et al, 2017) proposed a hybrid neural model incorporating both the word and character-level information. (Chollampatt and Ng, 2018) used a multilayer convolutional encoder-decoder neural network and outperforms all prior neural and statistical based systems on this task.…”

Section: Related Workmentioning

confidence: 99%

Improving Grammatical Error Correction via Pre-Training a Copy-Augmented Architecture with Unlabeled Data

Zhao¹,

Wang

Shen³

et al. 2019

Proceedings of the 2019 Conference of the North

171

153

View full text Add to dashboard Cite

Neural machine translation systems have become state-of-the-art approaches for Grammatical Error Correction (GEC) task. In this paper, we propose a copy-augmented architecture for the GEC task by copying the unchanged words from the source sentence to the target sentence. Since the GEC suffers from not having enough labeled training data to achieve high accuracy. We pre-train the copy-augmented architecture with a denoising auto-encoder using the unlabeled One Billion Benchmark and make comparisons between the fully pre-trained model and a partially pretrained model. It is the first time copying words from the source context and fully pretraining a sequence to sequence model are experimented on the GEC task. Moreover, We add token-level and sentence-level multi-task learning for the GEC task. The evaluation results on the CoNLL-2014 test set show that our approach outperforms all recently published state-of-the-art results by a large margin. The code and pre-trained models are released at https://github.com/zhawe01/fairseq-gec.

show abstract

Section: Related Workmentioning

confidence: 99%

Improving Grammatical Error Correction via Pre-Training a Copy-Augmented Architecture with Unlabeled Data

Zhao¹,

Wang

Shen³

et al. 2019

Proceedings of the 2019 Conference of the North

171

153

View full text Add to dashboard Cite

show abstract

“…We use a bidirectional LSTM (Hochreiter and Schmidhuber, 1997) architecture for sentence classification, with dynamic attention over words for constructing the sentence representations. Related architectures have been successful for machine translation (Bahdanau et al, 2015), sentence summarization (Rush and Weston, 2015), entailment detection (Rocktäschel et al, 2016), and error correction (Ji et al, 2017). In this work, we modify the attention mechanism and training objective in order to make the resulting network suitable for also inferring binary token labels, while still performing well as a sentence classifier.…”

Section: Introductionmentioning

confidence: 99%

Zero-Shot Sequence Labeling: Transferring Knowledge from Sentences to Tokens

Rei

Søgaard

2018

Proceedings of the 2018 Conference of the North American Chapter Of the Association for Computational Linguistics: Hu

View full text Add to dashboard Cite

Can attention-or gradient-based visualization techniques be used to infer token-level labels for binary sequence tagging problems, using networks trained only on sentence-level labels? We construct a neural network architecture based on soft attention, train it as a binary sentence classifier and evaluate against tokenlevel annotation on four different datasets. Inferring token labels from a network provides a method for quantitatively evaluating what the model is learning, along with generating useful feedback in assistance systems. Our results indicate that attention-based methods are able to predict token-level labels more accurately, compared to gradient-based methods, sometimes even rivaling the supervised oracle network.

show abstract

“…The final sentence-level representation c is then fed into a logistic regression layer to predict the category. Another type of hierarchical attention takes a top-down approach, an example of which is for grammatical error correction (Ji et al 2017). Consider a corrupted sentence: I have no enough previleges.…”

Section: Hierarchical Attentionmentioning

confidence: 99%

An Introductory Survey on Attention Mechanisms in NLP Problems

2019

Advances in Intelligent Systems and Computing

182

View full text Add to dashboard Cite

First derived from human intuition, later adapted to machine translation for automatic token alignment, attention mechanism, a simple method that can be used for encoding sequence data based on the importance score each element is assigned, has been widely applied to and attained significant improvement in various tasks in natural language processing, including sentiment classification, text summarization, question answering, dependency parsing, etc. In this paper, we survey through recent works and conduct an introductory summary of the attention mechanism in different NLP problems, aiming to provide our readers with basic knowledge on this widely used method, discuss its different variants for different tasks, explore its association with other techniques in machine learning, and examine methods for evaluating its performance.

show abstract

A Nested Attention Neural Hybrid Model for Grammatical Error Correction

Cited by 91 publications

References 14 publications

Improving Grammatical Error Correction via Pre-Training a Copy-Augmented Architecture with Unlabeled Data

Improving Grammatical Error Correction via Pre-Training a Copy-Augmented Architecture with Unlabeled Data

Zero-Shot Sequence Labeling: Transferring Knowledge from Sentences to Tokens

An Introductory Survey on Attention Mechanisms in NLP Problems

Contact Info

Product

Resources

About