2006
DOI: 10.1007/11669487_33
|View full text |Cite
|
Sign up to set email alerts
|

Performance Comparison of Six Algorithms for Page Segmentation

Abstract: Abstract. This paper presents a quantitative comparison of six algorithms for page segmentation: X-Y cut, smearing, whitespace analysis, constrained text-line finding, Docstrum, and Voronoi-diagram-based. The evaluation is performed using a subset of the UW-III collection commonly used for evaluation, with a separate training set for parameter optimization. We compare the results using both default parameters and optimized parameters. In the course of the evaluation, the strengths and weaknesses of each algori… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
36
0

Year Published

2006
2006
2020
2020

Publication Types

Select...
5
1
1

Relationship

1
6

Authors

Journals

citations
Cited by 52 publications
(36 citation statements)
references
References 17 publications
0
36
0
Order By: Relevance
“…[10] and Gorman [15] are the most widely cited algorithms to perform geometric page segmentation on non-Manhattan layouts of non overlapping zones. Evaluation experiments [17] have shown that the Voronoi based approach excels on a mixed dataset of both handwritten and machine printed documents for diverse scripts such as English and Arabic. Nevertheless, there are still challenges which the technique faces with such datasets.…”
Section: Non-manhattan Layout Basedmentioning
confidence: 99%
See 2 more Smart Citations
“…[10] and Gorman [15] are the most widely cited algorithms to perform geometric page segmentation on non-Manhattan layouts of non overlapping zones. Evaluation experiments [17] have shown that the Voronoi based approach excels on a mixed dataset of both handwritten and machine printed documents for diverse scripts such as English and Arabic. Nevertheless, there are still challenges which the technique faces with such datasets.…”
Section: Non-manhattan Layout Basedmentioning
confidence: 99%
“…Extra edges are added on the boundaries, as voronoi segmentation does not produce those. [17,12] measure overall error as the percentage of ground-truth text-lines correctly contained within zones. A ground-truth text-line is said to lie completely within one detected zone if the area overlap between the two is significant.…”
Section: Polygonal Zonesmentioning
confidence: 99%
See 1 more Smart Citation
“…We use a fast labeling algorithm to extract connected components from the document image. In recent comparative studies for the performance evaluation of page segmentation algorithms [1,10], it is shown that the constrained textline finding algorithm [3] has the lowest error rates among the compared algorithms for textline extraction, and the Voronoi-diagram based algorithm [5] has the lowest error rates for extracting zone-level information. Therefore, we use these two algorithms for extracting textlines and zones from the document image, respectively.…”
Section: Geometric Matching For Page Frame Detectionmentioning
confidence: 99%
“…The whole image is binarized using a single threshold value in global techniques (Otsu, 1979;White and Rohrer, 1983;Chi et al, 1996;Cattoni et al, 1998;Viola and Jones, 2004;Shafait et al, 2006). Grey value of each pixel is compared with the single threshold value for binarization.…”
Section: Introductionmentioning
confidence: 99%