In a patch-based object recognition system the key role of a visual vocabulary is to provide a way to map the low-level features into a fixed-length vector in histogram space to which standard classifiers can be directly applied. The discriminative power of such a visual vocabulary determines the quality of the vocabulary model, whereas the size of the vocabulary controls the complexity of the model. A compact visual vocabulary provides a lower-dimensional representation whereas a large-sized vocabulary may overfit to the distribution of visual words in an image and lead to heavy computational load. The generic framework of a bag-of-features approach follows a standard routine extracting local image descriptors and clustering with a user-designated number of clusters. The problem with this routine lies in that constructing a vocabulary for each single dataset is not efficient. Usually the construction of a vocabulary is achieved by cluster analysis using K-means algorithm. However, one of its drawbacks is the choice of a suitable value for K which determines the size of a visual vocabulary. The choice of the size of a vocabulary should be balanced between the recognition rate and computational needs. In this paper we propose a two-staged approach to map an initial high-dimensional vocabulary into a compact vocabulary while maintaining its discriminative power. Using an initial larger vocabulary we first represent the training images using a coding scheme that maps the importance of each visual word within an image as visual bits. These set of visual bits of images then form a sparse representation of every visual word with respect to the set of category-specific training images that is used for the compression. We have tested our vocabulary compression technique on four computer vision tasks: (i) Xerox7 (ii) PASCAL VOC Challenge 2007 (iii) UIUC texture and (iv) MPEG7 CE Shape-1 Part B Silhouette image datasets. Testing results show that the proposed method slightly outperforms vocabularies learnt by K-means by achieving just half the size of initial vocabulary. Our compression technique could help to optimize larger vocabularies to fewer visual words with stable performance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.