2014
DOI: 10.1007/978-3-642-54903-8_43
|View full text |Cite
|
Sign up to set email alerts
|

State-of-the-Art in Weighted Finite-State Spell-Checking

Abstract: The following claims can be made about finite-state methods for spell-checking: 1) Finite-state language models provide support for morphologically complex languages that word lists, affix stripping and similar approaches do not provide; 2) Weighted finite-state models have expressive power equal to other, state-of-the-art string algorithms used by contemporary spell-checkers; and 3) Finite-state models are at least as fast as other string algorithms for lookup and error correction. In this article, we use som… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
24
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 25 publications
(24 citation statements)
references
References 15 publications
0
24
0
Order By: Relevance
“…Traditional spelling correction techniques rely on the fact that most spelling errors are within a short edit-distance of their correct form Kukich (1992); Max & Wisniewski (2010). That is why spelling correction needs special treatment in case of MRLs, which by nature consist of longer words resulting in errors in longer edit distances and mostly due to the wrong spellings outside the word lemma (within the affixes) Ingason et al (2009); Pirinen & Lindén (2014); Pirinen et al (2010). In case of languages having rather shorter words than MRLs, the complete omission of diacritics and vowels would not be a severe problem and could be jointly solved with spelling correction.…”
Section: The Proposed Architecturementioning
confidence: 99%
See 1 more Smart Citation
“…Traditional spelling correction techniques rely on the fact that most spelling errors are within a short edit-distance of their correct form Kukich (1992); Max & Wisniewski (2010). That is why spelling correction needs special treatment in case of MRLs, which by nature consist of longer words resulting in errors in longer edit distances and mostly due to the wrong spellings outside the word lemma (within the affixes) Ingason et al (2009); Pirinen & Lindén (2014); Pirinen et al (2010). In case of languages having rather shorter words than MRLs, the complete omission of diacritics and vowels would not be a severe problem and could be jointly solved with spelling correction.…”
Section: The Proposed Architecturementioning
confidence: 99%
“…Figure 5 gives the general flow of the employed system. SC#4 is inspired by Linden and Pirinen 2014, in that it uses a language and an error model together in order to generate candidates. Candidates which are generated by the error model are validated using the language model and the best proposal is the candidate with minimum rule cost and maximum unigram probability.…”
Section: The Proposed Architecturementioning
confidence: 99%
“…Recently, there has been a surge of interest in solving the spelling error correction problem via the web (e.g., Whitelaw et al, 2009;Sun et al, 2010) and to correct query strings for search engines (e.g., Duan and Hsu, 2011, and many others). Further approaches to spelling correction include finite state techniques (e.g., Pirinen and Lindén, 2014) and deep graphical models (e.g., Raaijmakers, 2013). Kukich (1992) summarizes many of the earlier approaches to spell checking such as based on triebased edit distances.…”
Section: Related Workmentioning
confidence: 99%
“…This means that the language experts can collect and curate data, while the engineers improve and add NLP systems, and when a new or improved system for a specific NLP application is finalised, it can be applied to all languages providing language data in the infrastructure. In practice for example, this has in past meant, that when new research was published making weighted finite-state spell-checking and correction end-user usable [9], all languages in the infrastructure could have an additional (albeit basic) spell-checker and corrector. Both in GiellaLT infra and Apertium system this is implemented at low level by simply applying the necessary changes to all of the language repositories.…”
Section: Infrastructures and Resourcesmentioning
confidence: 99%