2011 IEEE 11th International Conference on Data Mining 2011
DOI: 10.1109/icdm.2011.102
|View full text |Cite
|
Sign up to set email alerts
|

Mining Historical Documents for Near-Duplicate Figures

Abstract: Abstract-The increasing interest in archiving all of humankind's cultural artifacts has resulted in the digitization of millions of books, and soon a significant fraction of the world's books will be online. Most of the data in historical manuscripts is text, but there is also a significant fraction devoted to images. This fact has driven much of the recent increase in interest in query-by-content systems for images. While querying/indexing systems can undoubtedly be useful, we believe that the historical manu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2012
2012
2016
2016

Publication Types

Select...
4
1

Relationship

1
4

Authors

Journals

citations
Cited by 6 publications
(2 citation statements)
references
References 19 publications
0
2
0
Order By: Relevance
“…In 2007, Yankow et al [37] presented a uniform scaling approach to match differently scaled shapes. Rakthanmanon et al [24] matched near duplicate figures found in historical documents using a time series approach. Our work is motivated by the above techniques but, as we will show, we require significantly specialized pipelines to accommodate shredded documents.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…In 2007, Yankow et al [37] presented a uniform scaling approach to match differently scaled shapes. Rakthanmanon et al [24] matched near duplicate figures found in historical documents using a time series approach. Our work is motivated by the above techniques but, as we will show, we require significantly specialized pipelines to accommodate shredded documents.…”
Section: Related Workmentioning
confidence: 99%
“…Choice of fragment encoding: There are a number of ways described in the literature [16,37,24] for converting a image boundary into a time series. One approach that we initially attempted was to encode the image boundary as a time series based on the radial distance of the boundary from the image center.…”
Section: Lessons Learnedmentioning
confidence: 99%