Document layout structure extraction using bounding boxes of different entitles

Liang, Jisheng; Ha, Jiyeon; Haralick, Robert M.; Phillips, Ihsin T.

doi:10.1109/acv.1996.572074

Cited by 23 publications

(12 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In this section, we briefly present a rule-based algorithm that extracts document layout structure using the bounding boxes of different entities [15]. Then we report the performance of each module on the images from the UW-III Document Image Database.…”

Section: Resultsmentioning

confidence: 99%

Performance Evaluation of Document Structure Extraction Algorithms

Liang

Phillips

Haralick

2001

Computer Vision and Image Understanding

View full text Add to dashboard Cite

Section: Resultsmentioning

confidence: 99%

Performance Evaluation of Document Structure Extraction Algorithms

Liang

Phillips

Haralick

2001

Computer Vision and Image Understanding

View full text Add to dashboard Cite

“…Since then, many other works dealing with high-level form representation, studying the structural relation among fields and often pursuing a completely automatic form analysis have been presented. Some of them, are essentially rule-based, as [7], [6], [16] and [18]. Other use graphs to establish relations among the fields, as [2] and [20].…”

Section: Related Workmentioning

confidence: 99%

A Model-Based Field Frame Detection for Handwritten Filled-in Forms

Pérez-Cortés

Andreu

Arlandis

2008

2008 the Eighth IAPR International Workshop on Document Analysis Systems

View full text Add to dashboard Cite

In this paper, a method for detection and model definition of field frames in forms intended for handwritten data input is presented. The pre-printed field frames are extracted and parameterized to contribute to several tasks in the context of high-volume data-entry form scanning: assistance to the operator in the process of form specification, fine location of the pre-printed field frames, frame dropout and character restoration in the frequent case that handwritten characters data cross the field boundaries. Examples on real forms with very different layouts and characteristics are presented.

show abstract

“…These methods typically separate the original document into many different regions. Then use many filters to classify each region [5,18,20] (only one level of homogeneous region is used). In addition to creating many filters, these methods only effective when the region is not too complicated.…”

Section: Introductionmentioning

confidence: 99%

Hybrid page segmentation using multilevel homogeneity structure

Tran

Kim

2015

Proceedings of the 9th International Conference on Ubiquitous Information Management and Communication

View full text Add to dashboard Cite

This paper presents a hybrid method of page segmentation based on the combination of connected component analysis and classification on multilevel homogeneous regions. This suggests an iterative method. In which, connected component analysis is used to classify the non-text elements at each level of homogeneous region, and multilevel homogeneity structure is used to ensure this classification can identify all non-text elements. The result of this iterative method is the two documents, text document and non-text document. On text document, adaptive mathematical morphology in each text homogeneous region will give us the corresponding text region. On the non-text document, more detailed classification of the non-text components are made to get separators, tables, images, etc. For evaluation, we experiment our method with datasets from ICDAR2009 page segmentation competition. According to the results, our proposed method achieves the higher accuracy compared to other methods. This proves the effectiveness and superiority of our proposed method.

show abstract

Document layout structure extraction using bounding boxes of different entitles

Cited by 23 publications

References 6 publications

Performance Evaluation of Document Structure Extraction Algorithms

Performance Evaluation of Document Structure Extraction Algorithms

A Model-Based Field Frame Detection for Handwritten Filled-in Forms

Hybrid page segmentation using multilevel homogeneity structure

Contact Info

Product

Resources

About