2022
DOI: 10.14569/ijacsa.2022.0130594
|View full text |Cite
|
Sign up to set email alerts
|

Correcting Arabic Soft Spelling Mistakes using BiLSTM-based Machine Learning

Abstract: Soft spelling mistakes are a class of mistakes that is widespread among native Arabic speakers and foreign learners alike. Some of these mistakes are typographical in nature. They occur due to orthographic variations of some Arabic letters and the complex rules that dictate their correct usage. Many people forgo these rules, and given the identical phonetic sounds, they often confuse such letters. In this paper, we investigate how to use machine learning to correct such mistakes given that there are no suffici… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 8 publications
(4 citation statements)
references
References 40 publications
0
4
0
Order By: Relevance
“…Pienaar and Snyman use them for the identification of eleven official South African languages [5]. Abandah et al [6] use confusion matrix to correct spelling mistakes in Arabic with insufficient datasets to train the correction models.…”
Section: Confusion Matrixmentioning
confidence: 99%
“…Pienaar and Snyman use them for the identification of eleven official South African languages [5]. Abandah et al [6] use confusion matrix to correct spelling mistakes in Arabic with insufficient datasets to train the correction models.…”
Section: Confusion Matrixmentioning
confidence: 99%
“…Abandah et al [64] addressed soft Arabic-spelling errors, which usually occur due to typography, since Arabic letters possess orthographic variations. The BiLSTM network was proposed to correct such spelling errors at the character level.…”
Section: Post-ocr Correction Work [61-71]mentioning
confidence: 99%
“…Abandah et al [64] used a test set that was excessively small compared with the training sets, as it consisted of 2443 words, which is 0.10% of the Tashkeela set's word count.…”
Section: Gans-based Work [57-60]mentioning
confidence: 99%
“…In [10], Moslem et al introduced a many-to-one neural network-based context-sensitive spelling checking and correction model, where they modeled the words that come both before and after the word to be corrected as the conditional context in language model predictions. Finally, Abandah et al [23] used stacked Long Short-Term Memory (LSTM) modules for common soft Arabic spelling errors correction with stochastic error injection for a limited number of characters to capture limited frequent mistakes.…”
Section: Literature Reviewmentioning
confidence: 99%