2022
DOI: 10.1007/s42979-022-01499-x
|View full text |Cite
|
Sign up to set email alerts
|

Automatic Building of a Large Arabic Spelling Error Corpus

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2023
2023
2025
2025

Publication Types

Select...
4

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(3 citation statements)
references
References 17 publications
0
3
0
Order By: Relevance
“…Aichaoui et al [61] created a large dataset of spelling errors, called SPIRAL, for use as data for training deep-learning models that aim to fix spelling errors due to the lack of such a large set in Arabic. They collected textual data from various newspaper sites, such as Okaz and available open Arabic corpora sites, such as Maktabah Shamlah.…”
Section: Post-ocr Correction Work [61-71]mentioning
confidence: 99%
See 2 more Smart Citations
“…Aichaoui et al [61] created a large dataset of spelling errors, called SPIRAL, for use as data for training deep-learning models that aim to fix spelling errors due to the lack of such a large set in Arabic. They collected textual data from various newspaper sites, such as Okaz and available open Arabic corpora sites, such as Maktabah Shamlah.…”
Section: Post-ocr Correction Work [61-71]mentioning
confidence: 99%
“…In addition, GANs must be combined with a classifier, which contributes to enhancing the accuracy on one hand and the model's complexity on the other; • There is a crucial need for more public handwritten Arabic OCR datasets in terms of quantity, quality, scope, and font diversity in order for OCR research to progress. One significant limitation of the work of Aichaoui et al [61] is that the corpus building has an obviously uneven distribution between the types and categories of error; therefore, a lack of accuracy and results is associated with low-frequency errors. Evidence of this is shown as the numbers of deletion and keyboard errors (which obtained the worst recall, at 0.337 and 0.644, respectively) were, in fact, among the lowest in the dataset, comprising 0.67% and 6.26%, respectively.…”
Section: Gans-based Work [57-60]mentioning
confidence: 99%
See 1 more Smart Citation