2010
DOI: 10.1007/978-3-642-12275-0_22
|View full text |Cite
|
Sign up to set email alerts
|

BASIL: Effective Near-Duplicate Image Detection Using Gene Sequence Alignment

Abstract: Abstract. Finding near-duplicate images is a task often found in Multimedia Information Retrieval (MIR). Toward this effort, we propose a novel idea by bridging two seemingly unrelated fields -MIR and Biology. That is, we propose to use the popular gene sequence alignment algorithm in Biology, i.e., BLAST, in detecting near-duplicate images. Under the new idea, we study how various image features and gene sequence generation methods (using gene alphabets such as A, C, G, and T in DNA sequences) affect the accu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
6
0

Year Published

2010
2010
2023
2023

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 15 publications
(6 citation statements)
references
References 15 publications
0
6
0
Order By: Relevance
“…Most NDD papers describe NDD models or general principles of the NDD task, but not tools. There are no general reviews describing the application of NDD tools (only of general algorithms) and most algorithms [23,[39][40][41][42][43] refers to near-copy detection of fraudulent images, near-duplicate genetic sequences detection and other applica- The small datasets were suitable to generate a manual reference of the homogeneous patterns, while the large datasets were suitable to test flowSim when there are thousands of patterns to filter. Our first tests evaluated flowSim based on a comparison with the manual groupings, suggesting that flowSim was fit for the NDD task and is tunable to a user's desired level of similarity given a dataset's composition.…”
Section: Discussionmentioning
confidence: 99%
“…Most NDD papers describe NDD models or general principles of the NDD task, but not tools. There are no general reviews describing the application of NDD tools (only of general algorithms) and most algorithms [23,[39][40][41][42][43] refers to near-copy detection of fraudulent images, near-duplicate genetic sequences detection and other applica- The small datasets were suitable to generate a manual reference of the homogeneous patterns, while the large datasets were suitable to test flowSim when there are thousands of patterns to filter. Our first tests evaluated flowSim based on a comparison with the manual groupings, suggesting that flowSim was fit for the NDD task and is tunable to a user's desired level of similarity given a dataset's composition.…”
Section: Discussionmentioning
confidence: 99%
“…Watermarking and fingerprinting methods rely on the embedding of a signature within the original document before its dissemination [8] while content-based methods rely on the analysis of the document's content in order to extract relevant visual features. Regardless of the general philosophy, in the past decade we have seen some progress toward the development of effective systems to identify the cohabiting versions of images [1], [2] and videos [3], [4] in the wild. However, only recently there were the first attempts to go beyond NDDR with techniques to identify the structure of relationships within a set of near-duplicates [5]- [7].…”
Section: Related Workmentioning
confidence: 99%
“…With the increasing popularity of image and video sharing services, several research groups have presented solutions for the detection and recognition of near-duplicate (NDDR) images [1], [2] and videos [3], [4] in the last decade. NDDR techniques represent the first step for several applications such as reducing document versions, and for copyright and intellectual property protection.…”
Section: Introductionmentioning
confidence: 99%
“…ND Image has a diversity of understandings and definitions. Near duplicate images are those defined in Sebastiano Battiato[2013], Hung-sik Kim [2010], Ondrej Chum[2008], Dong-Qing[2004], Y. Ke et al[2004].…”
Section: Introductionmentioning
confidence: 99%
“…Ke et al[2004]. Author Definition Sebastiano Battiato et al [1] Based on the degree of variability (photometric and geometric) that is considered acceptable for each particular application Hung-sik Kim et al [2] Given a set of query images iqand a collection of source imagesIs , for each query image iq ( Iq), find all images ,Ir (⊆ Is) that are near-duplicate" to iq Ondrej Chum et al [3] An image is a near-duplicate of reference image if it is close, according to some defined measure. Dong-Qing Zhang and Shih-Fu Chang [4] pair of images in which one is similar to the exact duplicate of the other, but differs slightly due to variations of acquisition times,capturing conditions , editing operations and rendering conditions Y. Ke et al [5] images obtained by slightly modifying the original ones through common transformations such as changing contrast or saturation, scaling, cropping, etc It is understandable that a near duplicate image varies depending on what photo-metric and geometric variations are deemed acceptable.…”
Section: Introductionmentioning
confidence: 99%