Proceedings of the Third CIPS-SIGHAN Joint Conference on Chinese Language Processing 2014
DOI: 10.3115/v1/w14-6835
|View full text |Cite
|
Sign up to set email alerts
|

Chinese Spelling Error Detection and Correction Based on Language Model, Pronunciation, and Shape

Abstract: Spelling check is an important preprocessing task when dealing with user generated texts such as tweets and product comments. Compared with some western languages such as English, Chinese spelling check is more complex because there is no word delimiter in Chinese written texts and misspelled characters can only be determined in word level. Our system works as follows. First, we use character-level n-gram language models to detect potential misspelled characters with low probabilities below some predefined thr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
69
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 79 publications
(69 citation statements)
references
References 8 publications
0
69
0
Order By: Relevance
“…Most CSC related studies have emerged as a result of a series of shared tasks (Wu et al, 2013;Tseng et al, 2015;Fung et al, 2017;Gaoqi et al, 2018), which involve automatic detection and correction of spelling errors for a given sentence. Earlier work in CSC focus mainly on unsupervised methods such as language model with a pre-constructed confusionset Yu and Li, 2014). Subsequently, some work cast CSC as a sequential labeling problem, in which conditional random fields (CRF) (Lafferty et al, 2001), gated recurrent networks (Hochreiter and Schmidhuber, 1997;Chung et al, 2014) have been employed to model the problem (Zheng et al, 2016;Xie et al, 2017;Wu et al, 2018).…”
Section: Related Workmentioning
confidence: 99%
“…Most CSC related studies have emerged as a result of a series of shared tasks (Wu et al, 2013;Tseng et al, 2015;Fung et al, 2017;Gaoqi et al, 2018), which involve automatic detection and correction of spelling errors for a given sentence. Earlier work in CSC focus mainly on unsupervised methods such as language model with a pre-constructed confusionset Yu and Li, 2014). Subsequently, some work cast CSC as a sequential labeling problem, in which conditional random fields (CRF) (Lafferty et al, 2001), gated recurrent networks (Hochreiter and Schmidhuber, 1997;Chung et al, 2014) have been employed to model the problem (Zheng et al, 2016;Xie et al, 2017;Wu et al, 2018).…”
Section: Related Workmentioning
confidence: 99%
“…Ref [15] proposed that text correction requires two steps of error detection and error correction. Traditional OCR corrections are more often use language models and confusion matrices [6], [16], [17]. However, after the document OCR, the text will be lost due to occlusion, watermark, etc.…”
Section: Related Workmentioning
confidence: 99%
“…Misspelling detection research is very limited, small scale and often on domain specific private data (Zamora et al, 1981). Approaches for misspelling detection primarily involve use of a predefined dictionary of n-grams (Zamora et al, 1981) and words (Dalkiliç and Ç ebi, 2009;Yu and Li, 2014;Attia et al, 2012). Additionally, dictionaries used are limited to specific languages like Chinese (Yu and Li, 2014), Turkish (Dalkiliç and Ç ebi, 2009) and Arabic (Attia et al, 2012).…”
Section: Related Workmentioning
confidence: 99%
“…Leveraging existing approaches for misspelling detection from product images is beset with a number of challenges. First, although spelling research has intrigued the NLP community for long (Damerau, 1964;Kukich, 1992), misspelling detection research (Zamora et al, 1981;Dalkiliç and Ç ebi, 2009;Attia et al, 2012;Yu and Li, 2014) is very sparse, language specific and the primary approach has remained a dictionary lookup. This approach does not scale or generalize to billions of product images leading to a large number of false positive detections.…”
Section: Introductionmentioning
confidence: 99%