MUD: Mapping-based query processing for high-dimensional uncertain data

Shou, Lidan; Zhang, Xiaolong; Chen, Gang; Yuan, Guandou; Chen, Ke

doi:10.1016/j.ins.2012.02.023

Cited by 5 publications

(3 citation statements)

References 45 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The reason is APLA‐scan is actually based on the R‐tree, it is not efficient for high‐dimensional uncertain data. Although based on filter methods, LSR forest has a significant improvement compared with mud algorithm, 38 and also has a certain advantage compared with mud+ algorithm 38 …”

Section: Resultsmentioning

confidence: 99%

“…First, the R‐Tree‐based k ‐bound filtering algorithm is used to delete objects that cannot be the result of the query; Second, the probability subset selection algorithm is used to efficiently detect the k subset to quickly filter the set of objects that do not satisfy the condition; Finally, the returned results are filtered by the probability upper bound and lower bound verification methods to further filter the query results. MUD/MUD+ 38 adopts a cost‐effective pruning technique based on a very simple form of probabilistic pruning information, namely, the probabilistic quantiles. They map high‐dimensional uncertain objects to a single‐dimensional space, where the quantiles of uncertain objects can be indexed using the existing single‐dimensional indices such as the B+‐tree.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

LSR‐forest: An locality sensitive hashing‐based approximate k‐nearest neighbor query algorithm on high‐dimensional uncertain data

Wang

Qian

Yang

et al. 2020

Concurrency and Computation

View full text Add to dashboard Cite

Summary Uncertain data is widely used in many practical applications, such as data cleaning, location‐based services, privacy protection, and so on. With the development of technology, data has a tendency to high‐dimensionality. The most common indexes for nearest neighbor search on uncertain data are the R‐Tree and the KD‐Tree. These indexes will inevitably bring about “curse of dimension.” Focus on this problem, article proposes a new hash algorithm, called the LSR‐forest, which based on locality sensitive hashing and R‐Tree, to solve the high‐dimensional uncertain data approximate neighbor search problem. The LSR‐forest can hash similar high‐dimensional uncertain data into a same bucket with a high probability, and then constructs multiple R‐Tree‐based indexes for hashed buckets. When querying, it is possible to judge neighbors by checking the data in the hypercube which the query point is in. One can also adjust the query range automatically by different parameter of k. Many experiments on different datasets are presented in this article. The results show that LSR‐forest has better effectiveness and efficiency than R‐Tree on high‐dimensional datasets.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

LSR‐forest: An locality sensitive hashing‐based approximate k‐nearest neighbor query algorithm on high‐dimensional uncertain data

Wang

Qian

Yang

et al. 2020

Concurrency and Computation

View full text Add to dashboard Cite

show abstract

“…Furthermore, measurement values are continuously changing because of the positions of instrumentation devices or workers’ conditions. Aside from these examples, data randomness, missing data, delayed updates, and worker fatigue are other factors of data uncertainty [ 8 , 9 ].…”

Section: Introductionmentioning

confidence: 99%

Cluster Validity Index for Uncertain Data Based on a Probabilistic Distance Measure in Feature Space

Baek

Tavakkol

et al. 2023

Sensors

View full text Add to dashboard Cite

Cluster validity indices (CVIs) for evaluating the result of the optimal number of clusters are critical measures in clustering problems. Most CVIs are designed for typical data-type objects called certain data objects. Certain data objects only have a singular value and include no uncertainty, so they are assumed to be information-abundant in the real world. In this study, new CVIs for uncertain data, based on kernel probabilistic distance measures to calculate the distance between two distributions in feature space, are proposed for uncertain clusters with arbitrary shapes, sub-clusters, and noise in objects. By transforming original uncertain data into kernel spaces, the proposed CVI accurately measures the compactness and separability of a cluster for arbitrary cluster shapes and is robust to noise and outliers in a cluster. The proposed CVI was evaluated for diverse types of simulated and real-life uncertain objects, confirming that the proposed validity indexes in feature space outperform the pre-existing ones in the original space.

show abstract

Support function machine for set-based classification with application to water quality evaluation

Chen

Xue

et al. 2017

Information Sciences

View full text Add to dashboard Cite

MUD: Mapping-based query processing for high-dimensional uncertain data

Cited by 5 publications

References 45 publications

LSR‐forest: An locality sensitive hashing‐based approximate k‐nearest neighbor query algorithm on high‐dimensional uncertain data

LSR‐forest: An locality sensitive hashing‐based approximate k‐nearest neighbor query algorithm on high‐dimensional uncertain data

Cluster Validity Index for Uncertain Data Based on a Probabilistic Distance Measure in Feature Space

Support function machine for set-based classification with application to water quality evaluation

Contact Info

Product

Resources

About