Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 2021
DOI: 10.18653/v1/2021.findings-acl.198
|View full text |Cite
|
Sign up to set email alerts
|

Correcting Chinese Spelling Errors with Phonetic Pre-training

Abstract: Chinese spelling correction (CSC) is an important yet challenging task. Existing state-ofthe-art methods either only use a pre-trained language model or incorporate phonological information as external knowledge. In this paper, we propose a novel end-to-end CSC model that integrates phonetic features into language model by leveraging the powerful pre-training and fine-tuning method. Instead of conventionally masking words with a special token in training language model, we replace words with phonetic features … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
26
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 44 publications
(26 citation statements)
references
References 16 publications
0
26
0
Order By: Relevance
“…There are also task-specific BERT models, especially for the task of grammatical error correction since an important type of error is caused by characters pronounced with the same pinyin. Zhang et al (2021a) add a pinyin embedding layer and learns to predict characters from similarly pronounced candidates. PLOME add two embedding layers implemented with two GRU networks to inject both pinyin and shape of characters, respectively.…”
Section: Related Workmentioning
confidence: 99%
“…There are also task-specific BERT models, especially for the task of grammatical error correction since an important type of error is caused by characters pronounced with the same pinyin. Zhang et al (2021a) add a pinyin embedding layer and learns to predict characters from similarly pronounced candidates. PLOME add two embedding layers implemented with two GRU networks to inject both pinyin and shape of characters, respectively.…”
Section: Related Workmentioning
confidence: 99%
“…As most spelling errors come from similar pronunciations or glyphs, Many successive studies merged the similarity knowledge into spellers. Nguyen et al [21] adopted glyph features while Cheng et al [2], Xu et al [32], Zhang et al [35] employed phonetic information. Researcher also explored the mixture method of these similarities, such as an adaptive gating module [32], GCN [2] and multi-modal [12].…”
Section: Chinese Spelling Checkmentioning
confidence: 99%
“…Since almost all the spelling errors are related to phonological or visual similarity [16], many subsequent studies incorporated the similarity knowledge into spellers. For example, Nguyen et al [21] utilized glyph information while Cheng et al [2], Xu et al [32], Zhang et al [35] employed phonetic features. The fusion method of these similarities has also been explored, such as Graph Convolution Network (GCN) [2] and multi-modal [12,32].…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations