2008
DOI: 10.1007/s10791-008-9055-y
|View full text |Cite
|
Sign up to set email alerts
|

Effect of OCR error correction on Arabic retrieval

Abstract: Arabic documents that are available only in print continue to be ubiquitous and they can be scanned and subsequently OCR'ed to ease their retrieval. This paper explores the effect of context-based OCR correction on the effectiveness of retrieving Arabic OCR documents using different index terms. Different OCR correction techniques based on language modeling with different correction abilities were tested on real OCR and synthetic OCR degradation. Results show that the reduction of word error rates needs to pas… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
16
0

Year Published

2009
2009
2020
2020

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 9 publications
(16 citation statements)
references
References 37 publications
0
16
0
Order By: Relevance
“…So, term weighting functions may be introduced to assign importance to the individual words of a document representation, in such a manner that it can be more or less dependent on the collection misspelling (Taghva et al, 1994). At this point, experimental results (Magdy and Darwish, 2008) have proved that using a sufficiently large language model for correction can minimize the need for morphologically sensitive error repair.…”
Section: The Spelling Correction Approachmentioning
confidence: 99%
See 1 more Smart Citation
“…So, term weighting functions may be introduced to assign importance to the individual words of a document representation, in such a manner that it can be more or less dependent on the collection misspelling (Taghva et al, 1994). At this point, experimental results (Magdy and Darwish, 2008) have proved that using a sufficiently large language model for correction can minimize the need for morphologically sensitive error repair.…”
Section: The Spelling Correction Approachmentioning
confidence: 99%
“…Within this context, there is a need to tackle aspects that have a decisive effect on the complexity of the problem, such as content heterogeneity (Huang and Efthimiadis, 2009;Kwon et al, 2009;Li et al, 2006) and the increasing size of the databases on which the search is performed (Celikik and Bast, 2009). This has led to the appearance of specific proposals both with regard to language (Hagiwara and Suzuki, 2009;Magdy and Darwish, 2008;Suzuki et al, 2009) and the area of knowledge under consideration (Wilbur et al, 2006), making it advisable to foresee the inclusion of mechanisms for managing misspelled queries of this nature during the design stage of IR tools (Konchady, 2008).…”
Section: Introductionmentioning
confidence: 99%
“…In recent years, Magdy and Darwish [11] investigated the effect of OCR correction techniques on the effectiveness of retrieving Arabic document images using distinct index terms. Results show that effects on retrieval are recognisable only if the reduction of word error rates surpasses a given limit.…”
Section: Related Workmentioning
confidence: 99%
“…To remedy this problem, several methods are proposed. These methods generally try to correct OCR errors or expand the query [11,14,18]. In these works, it is shown that negative effects of OCR errors can be reduced by such advanced methods.…”
Section: Typical Components Of An Ir Systemmentioning
confidence: 99%