CYUT-III Team Chinese Grammatical Error Diagnosis System Report in NLPTEA-2018 CGED Shared Task

Wu, Shih-Hung; Wang, Junwei; Chen, Liang-Pu; Yang, Ping-Che

doi:10.18653/v1/w18-3729

Cited by 5 publications

(2 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Earlier work in CSC focus mainly on unsupervised methods such as language model with a pre-constructed confusionset Yu and Li, 2014). Subsequently, some work cast CSC as a sequential labeling problem, in which conditional random fields (CRF) (Lafferty et al, 2001), gated recurrent networks (Hochreiter and Schmidhuber, 1997;Chung et al, 2014) have been employed to model the problem (Zheng et al, 2016;Xie et al, 2017;Wu et al, 2018). More recently, motivated by a serials of remarkable suc-cess achieved by neural network-based sequenceto-sequence learning (Seq2Seq) in various natural language processing (NLP) tasks (Sutskever et al, 2014;, generative models have also been applied to the spelling check task by considering it as an encoder-decoder (Xie et al, 2016;Ge et al, 2018).…”

Section: Related Workmentioning

confidence: 99%

Confusionset-guided Pointer Networks for Chinese Spelling Check

Wang¹,

Tay²,

Zhong³

2019

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

View full text Add to dashboard Cite

This paper proposes Confusionset-guided Pointer Networks for Chinese Spell Check (CSC) task. More concretely, our approach utilizes the off-the-shelf confusionset for guiding the character generation. To this end, our novel Seq2Seq model jointly learns to copy a correct character from an input sentence through a pointer network, or generate a character from the confusionset rather than the entire vocabulary. We conduct experiments on three human-annotated datasets, and results demonstrate that our proposed generative model outperforms all competitor models by a large margin of up to 20% F1 score, achieving state-of-the-art performance on three datasets.

show abstract

Section: Related Workmentioning

confidence: 99%

Confusionset-guided Pointer Networks for Chinese Spelling Check

Wang¹,

Tay²,

Zhong³

2019

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

View full text Add to dashboard Cite

show abstract

“…With the development of end-to-end networks, some work proposed to optimize the error correction performance directly as a sequence-labeling task with conditional random fields (CRF) (Wu et al, 2018) and recurrent neural networks (RNN) (Zheng et al, 2016;Yang et al, 2017). Wang et al (2019) used a sequence-to-sequence framework with copy mechanism to copy the correction results directly from a prepared confusion set for the erroneous words.…”

Section: Introductionmentioning

confidence: 99%

Correcting Chinese Spelling Errors with Phonetic Pre-training

Zhang¹,

Pang²,

Zhang³

et al. 2021

Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

View full text Add to dashboard Cite

Chinese spelling correction (CSC) is an important yet challenging task. Existing state-ofthe-art methods either only use a pre-trained language model or incorporate phonological information as external knowledge. In this paper, we propose a novel end-to-end CSC model that integrates phonetic features into language model by leveraging the powerful pre-training and fine-tuning method. Instead of conventionally masking words with a special token in training language model, we replace words with phonetic features and their sound-alike words. We further propose an adaptive weighted objective to jointly train error detection and correction in a unified framework. Experimental results show that our model achieves significant improvements on SIGHAN datasets and outperforms the previous state-of-the-art methods.

show abstract

Pre-Training-Based Grammatical Error Correction Model for the Written Language of Chinese Hearing Impaired Students

Chen

Zhang

2022

IEEE Access

View full text Add to dashboard Cite

Grammatical error correction has been considered as an application closely related to daily life and an important shared task in many prestigious competitions and workshops. The neural machine translation with an encoder-decoder architecture containing language models has been the fundamental solution for the grammatical error correction. Whereas Grammatical error correction task on texts of deaf people or its solution has not been seen yet, and common Grammatical error correction tasks are suffering several challenges, such as insufficient training data, insufficient accuracy due to the unsatisfactory capacity of extracting semantic and grammatical patterns. Under these circumstances, we proposed a novel encoderdecoder architecture based on multi-head self-attention along with multiple strategies, which excels at extracting deep representations from the corrupted sentences of deaf students and further reconstructing the sentences into grammatical ones. Via the re-ranking strategy, our model can correct various kinds of errors including spelling and complex syntax errors. The ablation experiments prove that the semantic extracting of self-attention mechanism excluding the position encoding with the word order shuffle operation can significantly learn the deaf students' sentence patterns whose word order is quite different from the ones of hearing people and improve the correction scores. The pre-training can enhance the restoring efficiency of sentence structure in the decoding process. The comparison experiments with baseline models show that our model obtains superior performance either in the deaf students' grammatical error correction or in a common grammatical error correction shared task.

show abstract

CYUT-III Team Chinese Grammatical Error Diagnosis System Report in NLPTEA-2018 CGED Shared Task

Cited by 5 publications

References 10 publications

Confusionset-guided Pointer Networks for Chinese Spelling Check

Confusionset-guided Pointer Networks for Chinese Spelling Check

Correcting Chinese Spelling Errors with Phonetic Pre-training

Pre-Training-Based Grammatical Error Correction Model for the Written Language of Chinese Hearing Impaired Students

Contact Info

Product

Resources

About