We propose a partitioning scheme for similarity search indexes that is called Maximal Metric Margin Partitioning (MMMP). MMMP divides the data on the basis of its distribution pattern, especially for the boundaries of clusters. A partitioning surface created by MMMP is likely to be at maximum distances from the two cluster boundaries. MMMP is the first similarity search index approach to focus on partitioning surfaces and data distribution patterns. We also present an indexing scheme, named the MMMP-Index, which uses MMMP and small ball partitioning. The MMMPIndex prunes many objects that are not relevant to a query, and it reduces the query execution cost. Our experimental results show that MMMP effectively indexes clustered data and reduces the search cost. For clustered vector data, the MMMP-Index reduces the computational cost to less than two thirds that of comparable schemes.
We investigated the problem of reducing the cost of searching for the k closest pairs in metric spaces. In general, a k-closest pair search method initializes the upper bound distance between the k closest pairs as infinity and repeatedly updates the upper bound distance whenever it finds pairs of objects whose distances are shorter than that distance. Furthermore, it prunes dissimilar pairs whose distances are estimated as longer than the upper bound distance based on the distances from the pivot to objects and the triangle inequality. The cost of a k-closest pair query is smaller for a shorter upper bound distance and a sparser distribution of distances between the pivot and objects. We propose a new divide-and-conquer-based k-closest pair search method in metric spaces, called Adaptive Multi-Partitioning (AMP). AMP repeatedly divides and conquers objects from the sparser distance-distribution space and speeds up the convergence of the upper bound distance before partitioning the denser space. As a result, AMP can prune many dissimilar pairs compared with ordinary divide-and-conquer-based method. We compare our method with other partitioning method and show that AMP reduces distances computations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.