2019
DOI: 10.1016/j.patcog.2018.10.004
|View full text |Cite
|
Sign up to set email alerts
|

CONFIRM – Clustering of noisy form images using robust matching

Abstract: Identifying the type of a scanned form greatly facilitates processing, including automated field segmentation and field recognition. Contrary to the majority of existing techniques, we focus on unsupervised type identification, where the set of form types are not known apriori, and on noisy collections that contain very similar document types. This work presents a novel algorithm: CONFIRM (Clustering Of Noisy Form Images using Robust Matching), which simultaneously discovers the types in a collection of forms … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(2 citation statements)
references
References 41 publications
0
2
0
Order By: Relevance
“…Tensmeyer and Martinez applied an unsupervised clustering based approach to cluster visually similar noise images [16]. They employed a 3-stage scalable clustering approach which first clusters a subset of the data, then these clusters are further split to create purer subclusters, and at last a classifier is trained on top to recreate the subclusters.…”
Section: Related Workmentioning
confidence: 99%
“…Tensmeyer and Martinez applied an unsupervised clustering based approach to cluster visually similar noise images [16]. They employed a 3-stage scalable clustering approach which first clusters a subset of the data, then these clusters are further split to create purer subclusters, and at last a classifier is trained on top to recreate the subclusters.…”
Section: Related Workmentioning
confidence: 99%
“…This model has a complex pipeline that depends heavily on traditional hand-crafted features; in contrast, our approach achieves better results using a pipeline that is based entirely on unsupervised feature learning. Moreover, the CONFIRM algorithm [24] uses page elements such as OCR transcriptions and rule lines to obtain collection-dependent features.…”
Section: Related Work a Document Image Classificationmentioning
confidence: 99%