The analysis of visual information often involves the manipulation of enormous volumes of data. If some tolerance is allowed in the results, orders of magnitude improvement in efficiency can be achieved in such analysis by appropriate selective processing, without necessarily considering all the data features. To guarantee that the error introduced does not exceed the allowed limit, a certain minimum proportion of the data must be involved in the analysis. This proportion cannot be determined arbitrarily. It should be chosen based on some formal methods, with a consideration of the error inherent in the data. This paper presents some techniques for improving the retrieval efficiency in image-based information systems, with performance guarantees on the reliability of results. Using the statistical theory of occupancy, it develops a model for the formal selection of the minimal subset of image features to be involved in histogram-based similarity evaluation. This guarantees that decisions based on the minimum proportion are always the same as (or close to) the one that would have been reached by considering all the features. Results on real and simulated data show the performance of the model on speedup, robustness, scalability, and performance guarantees.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.