Efficiently Finding Near Duplicate Figures in Archives of Historical Documents

Rakthanmanon, Thanawin; Zhu, Qiaoming; Keogh, Eamonn

doi:10.4304/jmm.7.2.109-123

Cited by 5 publications

(3 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Other works mainly employ hand-crafted methods, including Crowley and Zisserman [ 18 ] who attempted object retrieval in paintings, Hu and Collomosse [ 19 ] used HOG descriptor for sketch-based image retrieval, Rakthanmanon et al [ 20 ] introduced a motif discovery algorithm that allows for detecting similar sub-images across documents using the Generalized Hough Transform. En et al [ 21 ] proposed a local descriptor-based algorithm for spotting patterns in historical documents and Ginosar et al [ 22 ] used four methods for detecting People in Cubist Art using a deformable part model.…”

Section: Related Workmentioning

confidence: 99%

Cross-Depicted Historical Motif Categorization and Retrieval with Deep Learning

et al. 2020

View full text Add to dashboard Cite

In this paper, we tackle the problem of categorizing and identifying cross-depicted historical motifs using recent deep learning techniques, with aim of developing a content-based image retrieval system. As cross-depiction, we understand the problem that the same object can be represented (depicted) in various ways. The objects of interest in this research are watermarks, which are crucial for dating manuscripts. For watermarks, cross-depiction arises due to two reasons: (i) there are many similar representations of the same motif, and (ii) there are several ways of capturing the watermarks, i.e., as the watermarks are not visible on a scan or photograph, the watermarks are typically retrieved via hand tracing, rubbing, or special photographic techniques. This leads to different representations of the same (or similar) objects, making it hard for pattern recognition methods to recognize the watermarks. While this is a simple problem for human experts, computer vision techniques have problems generalizing from the various depiction possibilities. In this paper, we present a study where we use deep neural networks for categorization of watermarks with varying levels of detail. The macro-averaged F1-score on an imbalanced 12 category classification task is 88.3 %, the multi-labelling performance (Jaccard Index) on a 622 label task is 79.5 %. To analyze the usefulness of an image-based system for assisting humanities scholars in cataloguing manuscripts, we also measure the performance of similarity matching on expert-crafted test sets of varying sizes (50 and 1000 watermark samples). A significant outcome is that all relevant results belonging to the same super-class are found by our system (Mean Average Precision of 100%), despite the cross-depicted nature of the motifs. This result has not been achieved in the literature so far.

show abstract

Section: Related Workmentioning

confidence: 99%

Cross-Depicted Historical Motif Categorization and Retrieval with Deep Learning

et al. 2020

View full text Add to dashboard Cite

show abstract

“…It can perform prediction through algorithm assembling. Li Guo-Hua [11], etc, apply integration between wavelet analysis theory and chaotic theory to predict network data flow of wireless sensor and they have acquired perfect prediction effects. [12] Applies local support vector to implement wireless network flow prediction and the experiments certify the effectiveness of local support vector on network flow prediction.…”

Section: Introductionmentioning

confidence: 99%

Traffic Prediction Scheme based on Chaotic Models in Wireless Networks

Feng

2013

JNW

View full text Add to dashboard Cite

Based on the local support vector algorithm of chaotic time series analysis, the Hannan-Quinn information criterion and SAX symbolization are introduced. Then a novel prediction algorithm is proposed, which is successfully applied to the prediction of wireless network traffic. For the correct prediction problems of short-term flow with smaller data set size, the weakness of the algorithms during model construction is analyzed by study and comparison to LDK prediction algorithm. It is verified the Hannan-Quinn information principle can be used to calculate the number of neighbor points to replace pervious empirical method, which uses the number of neighbor points to acquire more accurate prediction model. Finally, actual flow data is applied to confirm the accuracy rate of the proposed algorithm LSDHQ. It is testified by our experiments that it also has higher performance in adaptability than that of LSDHQ algorithm

show abstract

“…Most of the data found in historical manuscripts are mainly text, but with a few numbers of images. Rakthanmanon et al introduced a scalable system that can detect approximately repeated occurrences of shape patterns both within and between historical texts[51]. This ability to find repeated shapes allowed automatic annotation of…”

mentioning

confidence: 99%

A Review on Near-Duplicate Detection of Images using Computer Vision Techniques

Thyagharajan

Kalaiarasi

2020

Arch Computat Methods Eng

View full text Add to dashboard Cite

Nowadays, digital content is widespread and simply redistributable, either lawfully or unlawfully. For example, after images are posted on the internet, other web users can modify them and then repost their versions, thereby generating near-duplicate images. The presence of near-duplicates affects the performance of the search engines critically. Computer vision is concerned with the automatic extraction, analysis and understanding of useful information from digital images. The main application of computer vision is image understanding. There are several tasks in image understanding such as feature extraction, object detection, object recognition, image cleaning, image transformation, etc. There is no proper survey in literature related to near duplicate detection of images. In this paper, we review the state-of-the-art computer vision-based approaches and feature extraction methods for the detection of near duplicate images. We also discuss the main challenges in this field and how other researchers addressed those challenges. This review provides research directions to the fellow researchers who are interested to work in this field.

show abstract

Efficiently Finding Near Duplicate Figures in Archives of Historical Documents

Cited by 5 publications

References 25 publications

Cross-Depicted Historical Motif Categorization and Retrieval with Deep Learning

Cross-Depicted Historical Motif Categorization and Retrieval with Deep Learning

Traffic Prediction Scheme based on Chaotic Models in Wireless Networks

A Review on Near-Duplicate Detection of Images using Computer Vision Techniques

Contact Info

Product

Resources

About