Proceedings of the 9th International Conference on Ubiquitous Information Management and Communication 2015
DOI: 10.1145/2701126.2701138
|View full text |Cite
|
Sign up to set email alerts
|

Hybrid page segmentation using multilevel homogeneity structure

Abstract: This paper presents a hybrid method of page segmentation based on the combination of connected component analysis and classification on multilevel homogeneous regions. This suggests an iterative method. In which, connected component analysis is used to classify the non-text elements at each level of homogeneous region, and multilevel homogeneity structure is used to ensure this classification can identify all non-text elements. The result of this iterative method is the two documents, text document and non-tex… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2015
2015
2022
2022

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 13 publications
(7 citation statements)
references
References 24 publications
0
7
0
Order By: Relevance
“…Top-down approaches could instead provide more efficiency in computation, but only limited to processing specific types, like documents with a manhattan-based layout. Hence, hybrid methods [38,40] were adopted in literature which could combine both top-down and bottom-up cues to generate better results. Prior to deep learning era, rule-based segmentation techniques [7,10,37] were also used profoundly to solve the table detection problem.…”
Section: Heuristic Rule-based Document Layout Analysismentioning
confidence: 99%
“…Top-down approaches could instead provide more efficiency in computation, but only limited to processing specific types, like documents with a manhattan-based layout. Hence, hybrid methods [38,40] were adopted in literature which could combine both top-down and bottom-up cues to generate better results. Prior to deep learning era, rule-based segmentation techniques [7,10,37] were also used profoundly to solve the table detection problem.…”
Section: Heuristic Rule-based Document Layout Analysismentioning
confidence: 99%
“…Asi et al [2] came up with a multi-scale texture-based algorithm for document images where Gabor filters were applied to locate different regions and a minimization energy function was applied to segment them. Despite successes of both top-down and bottom-up strategies, there are techniques [51] that have integrated both of them to segment regions in digital documents with complex layouts.…”
Section: Traditional Document Layout Segmentationmentioning
confidence: 99%
“…Based on the binary image, we use the algorithm [19] to extract CC and store the coordinates. Let CCs be all the CC, CC i is the i th connected component in binary In 2015, Tran et al [21] proposed an effective method for classifying the text and non-text elements based on the multilevel classification. However, the goal of our system is the table detection where table candidates (non-text elements) are very different with text elements.…”
Section: Pre-processingmentioning
confidence: 99%
“…However, the goal of our system is the table detection where table candidates (non-text elements) are very different with text elements. Therefore, we only used the heuristic filter in [21] to determine the nontext elements for ruling line table candidate. In an addition to find the color table candidates, the CC i is classified the non-text element if CC i 's filled area is big and approximate…”
Section: Pre-processingmentioning
confidence: 99%
See 1 more Smart Citation