Efficient and accurate near-duplicate recognition is the trendy research area. Identification of invalid near-duplicate images offers a wide range of applications, including digital picture forensics, web-scale retrieval, and, social media analysis. This article intends to introduce a novel near duplicate detection model of images that consists of two stages such as (i) feature extraction and (ii) similarity computation. Originally, the image database is subjected to extracting the features, in which the area-based features and pixel-based features are extracted. Here, the area-based feature extraction includes the contrast context histogram (CCH-descriptors) and improved weighted bag of visual word (w-BovW) features; the pixel-based feature extraction includes the texture features like the proposed local vector pattern. Once the query image is given as the input, it is subjected to the feature extraction stage. Then, the feature vector database and the extracted features of query images are evaluated under similarity computation via improved Jaccard similarity evaluation. Thus, the near duplicate detection of the image is obtained in an effective manner.
K E Y W O R D Sfeature extraction, feature vector database, local vector pattern, near duplicate detection, similarity computation
INTRODUCTIONNowadays, digital contents are widespread and are redistributed easily, it may be lawful or unlawful. For instance, travelers, who travel to many places, take pictures, write travelogues and post them on the internet, these pictures are taken with or without viewpoint variation and that may be the shot from the same place. These images posted on the internet can be modified by other web users and they may repost them as their version.This generates a near-duplicate image. The near duplicate images are modified with the descriptions of normal images created through compression, geometric alterations, content augmentation, blurring, noise pollution, retaining and cutting out parts, and other techniques. The presence of those near-duplicate images causes wastage of storage, computing, and transmission resources, and it also affects the performance of the search engines critically, as the user has to search among the huge amount of near-duplicate images, for finding the image they want. Near-duplication of images 1,2 leads to wastage of resources in the network and it might be an indication of illegal transactions like image copyright violation. As a result, in web content security and image management, effective and efficient near-duplicate detection (NDD) of the picture is critical. Extracting useful features of an image to increase recognition accuracy is one of the key problems in near duplication image recognition. 3,4 Another issue is to increase the detection efficiency, which is difficult while considering the large image databases.The development of efficient and accurate NDD algorithms for various types of material such as video, image, text, and speech is a hot issue in research. 5,6 Near duplicates plays a crucial...