Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer 2021
DOI: 10.18653/v1/2021.acl-long.233
|View full text |Cite
|
Sign up to set email alerts
|

PLOME: Pre-training with Misspelled Knowledge for Chinese Spelling Correction

Abstract: Chinese spelling correction (CSC) is a task to detect and correct spelling errors in texts. CSC is essentially a linguistic problem, thus the ability of language understanding is crucial to this task. In this paper, we propose a Pre-trained masked Language mOdel with Misspelled knowledgE (PLOME) for CSC, which jointly learns how to understand language and correct spelling errors. To this end, PLOME masks the chosen tokens with similar characters according to a confusion set rather than the fixed token "[MASK]"… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
59
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 58 publications
(59 citation statements)
references
References 21 publications
0
59
0
Order By: Relevance
“…Works utilizing the BERT [89] encoder can utilize, or supplement, the default masking [MASK] token. The authors of [90] also used related words from confusion sets, while the authors of [91] replaced them with phonologically and visually similar ones.…”
Section: Transformer Models For Spelling Error Correctionmentioning
confidence: 99%
See 1 more Smart Citation
“…Works utilizing the BERT [89] encoder can utilize, or supplement, the default masking [MASK] token. The authors of [90] also used related words from confusion sets, while the authors of [91] replaced them with phonologically and visually similar ones.…”
Section: Transformer Models For Spelling Error Correctionmentioning
confidence: 99%
“…These are concatenated with word embeddings and are used in the final word encoder. For the Chinese language, [91] additionally added phonetic and shape embeddings acquired from separately-trained single-layer GRU [92] networks. Parallel to the character classification, authors also performed pronunciation prediction.…”
Section: Transformer Models For Spelling Error Correctionmentioning
confidence: 99%
“…[87] construct random rule-based generator covering the most often error categories of the Vietnamese language. Works utilizing BERT [88] encoder can utilize or supplement the default masking [MASK] token: [89] also use related words from confusion sets while [90] replace with phonologically and visually similar ones.…”
Section: Transformer Models For Spelling Error Correctionmentioning
confidence: 99%
“…These are concatenated with word embeddings and used in the final word encoder. For the Chinese language, [90] additionally add phonetic and shape embeddings, acquired from separately-trained single-layer GRU [91] networks. Parallel to character classification, authors also perform pronunciation prediction.…”
Section: Transformer Models For Spelling Error Correctionmentioning
confidence: 99%

Correcting diacritics and typos with a ByT5 transformer model

Stankevičius,
Lukoševičius,
Kapočiūtė-Dzikienė
et al. 2022
Preprint
“…To alleviate this problem, large-scale unlabelled corpus was used to enhance the spelling check ability of models. Some works [6,17,28,35] adopted multimodal-based methods to build corpora automatically for weakly supervised learning. Although some of these methods [28] work well, they still lack the consideration of the consistency between the real datasets and the automatically generated corpora, which will bring a big gap.…”
Section: Introductionmentioning
confidence: 99%