Proceedings of the First International Conference on Digital Access to Textual Cultural Heritage 2014
DOI: 10.1145/2595188.2595221
|View full text |Cite
|
Sign up to set email alerts
|

An open-source OCR evaluation tool

Abstract: This paper describes an open-source tool which computes statistics of the differences between a reference text an the output of an OCR engine. It also facilitates the spotting of mismatches by generating an aligned bitext where the differences are highlighted and cross linked.The tool accepts a variety of input formats (both for the reference and the OCR) and can also be also used to compare the output of two different OCR engines. Some considerations on the criteria to compare the textual content of two files… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
14
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
4
3
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 24 publications
(14 citation statements)
references
References 10 publications
0
14
0
Order By: Relevance
“…Cf. also Märgner and El Abed (2014) and Carrasco, (2014). 6 This version was produced by the subcontractor when the GT data was formed.…”
Section: Notesmentioning
confidence: 99%
“…Cf. also Märgner and El Abed (2014) and Carrasco, (2014). 6 This version was produced by the subcontractor when the GT data was formed.…”
Section: Notesmentioning
confidence: 99%
“…Furthermore, we prioritize the improvement on the standard WER over its positional-independent counterpart, with the intent to preserve the syntactic and semantic attributes of the extracted text, and support meaningful Natural Language Processing and Topic Evolution analyses of the obtained documents. To record these three metrics, we used the ocrevalUAtion open source tool [24]. Besides the described metrics, this tool also reports error rates by character, and aligned bitext for each document match, facilitating the comparison of the generated output against the reference text [24].…”
Section: E Extracted Text Evaluationmentioning
confidence: 99%
“…To record these three metrics, we used the ocrevalUAtion open source tool [24]. Besides the described metrics, this tool also reports error rates by character, and aligned bitext for each document match, facilitating the comparison of the generated output against the reference text [24]. We set up the evaluation so that punctuation and case differences were counted as well, when calculating the scores.…”
Section: E Extracted Text Evaluationmentioning
confidence: 99%
“…We extend our research to measure the impact of Combined Vertical Projection to the accuracy of OCR. The OCR accuracy is measured in Character Error Rate (CER), Word Error Rate (WER), and WER with Order Independent [8]. The computational time needed is also measured in 3 different iteration of Combined Vertical Projection technique.…”
Section: Introductionmentioning
confidence: 99%