A document image retrieval method tolerating recognition and segmentation errors of OCR using shape-feature and multiple candidates

Kameshiro, T.; Hirano, Teruaki; Okada, Y.; Yoda, F.

doi:10.1109/icdar.1999.791879

Cited by 13 publications

(5 citation statements)

References 2 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Motivated by this observation, some retrieval methods with the ability to tolerate the recognition errors of OCR have been researched later (Ohtam et al 1997). Additionally, some methods were reported to improve retrieval performance by using OCR candidates (Kameshiro et al 1999;Katsuyama 2002).…”

Section: Introductionmentioning

confidence: 95%

“…There are two primary approaches to locate the desirable text in the document images for retrieving the appropriate information; Optical Character Recognition (OCR) technique (Kameshiro et al 1999) and Document Image Retrieval (Keyword spotting) technique (Doermann 1998). Optical Character Recognition deals with the machine recognition of characters present in an input image obtained using scanning operation (Doermann 1998).…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

A survey of keyword spotting techniques for printed document images

2010

View full text Add to dashboard Cite

show abstract

Section: Introductionmentioning

confidence: 95%

Section: Introductionmentioning

confidence: 99%

A survey of keyword spotting techniques for printed document images

2010

View full text Add to dashboard Cite

show abstract

“…To search for a keyword in document images, first of all, by optical character recognition (OCR), we have to convert the format of document images from pictorial format to text format, which is translatable by the machine [1], and then by the use of the traditional methods of document retrieval, the target word is sought in the text. Although OCR is frequently used by researchers in this area, it has some disadvantages that cause OCR to be inappropriate in all retrieval cases.…”

Section: Introductionmentioning

confidence: 99%

Farsi document image recognition system using word layout signature

Ergun¹,

Norozpour²

2019

Turk J Elec Eng & Comp Sci

View full text Add to dashboard Cite

In this paper, a new representation of Farsi words is proposed to present the keyword spotting problems in Farsi document image retrieval. In this regard, we define a signature for each Farsi word based on the word connected component layout. The mentioned signature is shown as boxes, and then, by sketching vertical and horizontal lines, we construct a grid of each word to provide a new descriptor. One of the advantages of this method is that it can be used for both handwritten and machine-printed texts. Finally, to evaluate the performance of our system in comparison to other methods, a database that contains 19,582 printed Farsi words is examined, and after applying this approach, a recall rate of 98.1% and a precision rate of 94.3% are obtained.

show abstract

“…For document retrieval from large database, it is necessary to build an index containing multiple candidate recognition results so as to overcome the recognition error. According to the indexing technique, handwritten document retrieval methods can be categorized into two groups: indexing by character recognition (transcription) 4,17,18 and lexicondriven indexing. 1,42 Transcription-based text search relies on the character recognition accuracy.…”

Section: Introductionmentioning

confidence: 99%

Keyword Spotting From Online Chinese Handwritten Documents Using One-Versus-All Character Classification Model

Zhang

Wang

Liu

et al. 2013

Int. J. Patt. Recogn. Artif. Intell.

View full text Add to dashboard Cite

In this paper, we propose a method for text-query-based keyword spotting from online Chinese handwritten documents using character classi¯cation model. The similarity between the query word and handwriting is obtained by combining the character classi¯cation scores. The classi¯er is trained by one-versus-all strategy so that it gives high similarity to the target class and low scores to the others. Using character classi¯cation-based word similarity also helps overcome the out-of-vocabulary (OOV) problem. We use a character-synchronous dynamic search algorithm to e±ciently spot the query word in large database. The retrieval performance is further improved by using competing character confusion and writer-adaptive thresholds. Our experimental results on a large handwriting database CASIA-OLHWDB justify the superiority of oneversus-all trained classi¯ers and the bene¯ts of con¯dence transformation, character confusion and adaptive thresholds. Particularly, a one-versus-all trained prototype classi¯er performs as well as a linear support vector machine (SVM) classi¯er, but consumes much less storage of index¯le. The experimental comparison with keyword spotting based on handwritten text recognition also demonstrates the e®ectiveness of the proposed method.

show abstract

A document image retrieval method tolerating recognition and segmentation errors of OCR using shape-feature and multiple candidates

Cited by 13 publications

References 2 publications

A survey of keyword spotting techniques for printed document images

A survey of keyword spotting techniques for printed document images

Farsi document image recognition system using word layout signature

Keyword Spotting From Online Chinese Handwritten Documents Using One-Versus-All Character Classification Model

Contact Info

Product

Resources

About