2021
DOI: 10.1111/exsy.12692
|View full text |Cite
|
Sign up to set email alerts
|

Learning from mistakes: Improving spelling correction performance with automatic generation of realistic misspellings

Abstract: Sequence to sequence models (seq2seq) require a large amount of labelled training data to learn the mapping between the input and output. A large set of misspelled words together with their corrections is needed to train a seq2seq spelling correction system. Low-resource languages such as Turkish usually lack such large annotated datasets. Although misspelling-reference pairs can be synthesized with a random procedure, the generated dataset may not well match to genuine human-made misspellings. This might degr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(1 citation statement)
references
References 26 publications
0
1
0
Order By: Relevance
“…By applying a comprehensive analysis to the given sentence, it discerns potential words that are prone to alteration. Furthermore, it yields the word sequence with the highest probability by employing the Viterbi decoder algorithm [41,42]. Given that the texts found on news sites typically undergo an editorial process, it is infrequent for them to exhibit issues pertaining to spelling errors.…”
Section: Pre-processing Of Text Datamentioning
confidence: 99%
“…By applying a comprehensive analysis to the given sentence, it discerns potential words that are prone to alteration. Furthermore, it yields the word sequence with the highest probability by employing the Viterbi decoder algorithm [41,42]. Given that the texts found on news sites typically undergo an editorial process, it is infrequent for them to exhibit issues pertaining to spelling errors.…”
Section: Pre-processing Of Text Datamentioning
confidence: 99%