2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) 2017
DOI: 10.1109/icdar.2017.332
|View full text |Cite
|
Sign up to set email alerts
|

Machine Learning vs Deterministic Rule-Based System for Document Stream Segmentation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
4
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
2
2
2

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(4 citation statements)
references
References 14 publications
0
4
0
Order By: Relevance
“…A contextual and layout descriptor-based approach that represented the relationship of two consecutive pages of document stream was presented by Hamdi et al [5], [14]. In this approach, every page was represented with binary features of contextual and layout information, such as the textual fingerprint, ending signs, page number, dates, etc.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…A contextual and layout descriptor-based approach that represented the relationship of two consecutive pages of document stream was presented by Hamdi et al [5], [14]. In this approach, every page was represented with binary features of contextual and layout information, such as the textual fingerprint, ending signs, page number, dates, etc.…”
Section: Related Workmentioning
confidence: 99%
“…A two-class clas-sifier was trained using a decision tree to classify the pages into either a continuation or a break class where continuation class determines a page to be a continuation of the previous page, and break class determines the beginning of a new document. In a continuous effort to find the best approach, the authors compared the segmentation result using both rule-based and a machine learning-based approach to define the features and found the machine learning-based approach to produce better results than the rule-based approach [5].…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…These descriptors can be section numbers, page numbers, dates, salutation and conclusion formulas. The technique in [Hamdi et al, 2017] and [Hamdi et al, 2018], uses Doc2Vec model to realize the segmentation task. At first, the Doc2Vec is trained to learn the documents pages representation.…”
Section: Related Workmentioning
confidence: 99%