Proceedings of the Third Workshop on Analytics for Noisy Unstructured Text Data 2009
DOI: 10.1145/1568296.1568306
|View full text |Cite
|
Sign up to set email alerts
|

A comprehensive evaluation methodology for noisy historical document recognition techniques

Abstract: In this paper, we propose a new comprehensive methodology in order to evaluate the performance of noisy historical document recognition techniques. We aim to evaluate not only the final noisy recognition result but also the main intermediate stages of text line, word and character segmentation. For this purpose, we efficiently create the text line, word and character segmentation ground truth guided by the transcription of the historical documents. The proposed methodology consists of (i) a semiautomatic proce… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
4
0
1

Year Published

2011
2011
2016
2016

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(5 citation statements)
references
References 21 publications
0
4
0
1
Order By: Relevance
“…In black pixel projection histogram (N. Stamatopoulos and Gatos, 2009), characters are separated by cut-ting at turning points of the cross direction histogram. Figure 4 shows the black pixel projection histogram of a Kanji character.…”
Section: Black Pixel Projection Histogrammentioning
confidence: 99%
See 1 more Smart Citation
“…In black pixel projection histogram (N. Stamatopoulos and Gatos, 2009), characters are separated by cut-ting at turning points of the cross direction histogram. Figure 4 shows the black pixel projection histogram of a Kanji character.…”
Section: Black Pixel Projection Histogrammentioning
confidence: 99%
“…As far as we know, any ruby removal methods for earlymodern Japanese printed books have not been studied. As for existing methods to remove ruby characters from current books with standard typography, there are two main methods (N. Stamatopoulos and Gatos, 2009) (Fletcher and Kasturi, 1988): (1) Separating ruby characters linearly using density histogram and (2) separating ruby characters using circumscription rectangles. Both methods assume the standard typography for the target books.…”
Section: Introductionmentioning
confidence: 99%
“…Even in relatively recent (e.g. early twentieth century) documents, typography, printing and language can differ widely from modern usage [3].…”
Section: Introduction: the Problemmentioning
confidence: 99%
“…Other tools are however oriented to evaluate the accuracy in the interpretation of the content (the printed characters and words). This type of evaluation compares the output with a reference which contains a highly accurate transcription (or ground truth) of the source document [12]. The creation of such ground truth is expensive but usually a limited size one is enough to obtain significant numbers provided that it contains a representative sample of the collection.…”
mentioning
confidence: 99%