Summary
Uncertain data is widely used in many practical applications, such as data cleaning, location‐based services, privacy protection, and so on. With the development of technology, data has a tendency to high‐dimensionality. The most common indexes for nearest neighbor search on uncertain data are the R‐Tree and the KD‐Tree. These indexes will inevitably bring about “curse of dimension.” Focus on this problem, article proposes a new hash algorithm, called the LSR‐forest, which based on locality sensitive hashing and R‐Tree, to solve the high‐dimensional uncertain data approximate neighbor search problem. The LSR‐forest can hash similar high‐dimensional uncertain data into a same bucket with a high probability, and then constructs multiple R‐Tree‐based indexes for hashed buckets. When querying, it is possible to judge neighbors by checking the data in the hypercube which the query point is in. One can also adjust the query range automatically by different parameter of k. Many experiments on different datasets are presented in this article. The results show that LSR‐forest has better effectiveness and efficiency than R‐Tree on high‐dimensional datasets.