Efficient skyline computation over distributed interval data

Li, Xiaoyong; Ren, Kaijun; Yu, Jie

doi:10.1002/cpe.4075

Cited by 6 publications

(3 citation statements)

References 53 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Uncertainty in real-world data can transpire due to various scenarios; thus some studies have proposed algorithms to cater to different domains and environments such as Saad et al [39], [40] who extended the study in [38] to report skyline on uncertain dimensions when given interval queries, Huang [20] who worked on the continuous d ε -skyline query to cater to location-based query for objects with time-varying attributes, Li et al [27], who extensively studied skyline query over distributed interval data, and Ma'aruf et al [30]- [32], who worked on an alternative approach from [38] to cater skyline queries on uncertain data. On a different perspective from the two domains discussed previously, Elmi et al [15], [16] introduced the skyline paradigm that focuses on the evidential database, Dzolkhifli et al [14] worked on analysing interval uncertain data stream with k-means clustering technique, while Dehaki et al [11] proposed a rule-based skyline computation for data in dynamic database.…”

Section: ) Continuous Uncertainty Modelmentioning

confidence: 99%

Efficient Skyline Computation on Uncertain Dimensions

et al. 2021

View full text Add to dashboard Cite

The database community has observed in the past two decades, the growth of research interest in preference queries, each of which has its unique techniques, benefits, and drawbacks. One of them is skyline queries. Skyline queries aim to report to users interesting objects based on their preferences. Yet, they are not without their limitations. Hence, this paper focuses on efficiently extending skyline query processing to support the uncertainty in dimensions, which in this paper is defined as uncertain dimension.To process skyline queries on data with uncertain dimensions, we propose SkyQUD algorithm, where it provides a mechanism that will partition the dataset according to the characteristics of each object before skyline dominance tests are performed. In the pruning process, we utilise a probability threshold value τ to accommodate the large skyline size reported by SkyQUD due to the computed probabilities. The algorithm has been validated through extensive experiments. Its results exhibit that skyline queries can be performed effectively on uncertain dimensions, and the proposed algorithm is efficient in query answering and capable of handling large datasets.

show abstract

Section: ) Continuous Uncertainty Modelmentioning

confidence: 99%

Efficient Skyline Computation on Uncertain Dimensions

et al. 2021

View full text Add to dashboard Cite

show abstract

“…Though from previous works [6], [12], [23], [28], the R-based index structures support constrained query, but the selectivity ratio over uncertain data set is yet to be provided. We intend to provide the analysis of the indexing structures in supporting constrained query by varying the selectivity ratio from 0.1% to 99.5%.…”

Section: Performance With Varying Selectivity Ratiomentioning

confidence: 99%

Analyses of Indexing Techniques on Uncertain Data With High Dimensionality

et al. 2020

View full text Add to dashboard Cite

Deploying a solution for handling critical decision-based problem efficiently requires the processing of high-dimensional data. Over the years, due to modern technological advancement, unprecedented volume of uncertain data is been captured and this has necessitated the need to organize such data for better data access performance. To this effect, the use of indexing technique for supporting, organizing, and storing of uncertain data with high dimensionality has become pertinent. However, the choice of an indexing technique to improve search performance is highly influenced by the properties of the underlying data set, data construction methods employed by the indexing structure, and the query types it supports. This paper is motivated to conduct an extensive performance analysis among existing indexing techniques, namely: R-tree, R*-tree and X -tree, in order to realize the most efficient indexing structure for organizing, storing and ultimately improving search performance over uncertain data with high dimensionality. The results of the analyses with regard to CPU processing time and number of nodes visited clearly show the superiority of X -tree over R-tree and R*-tree, as its superiority holds for different data set sizes, data distributions, number of dimensions and even with varying selectivity ratio.INDEX TERMS Data partitioning, indexing techniques, MBR, uncertain data, high-dimensional data.

show abstract

“…In the work of Le et al, 40 the probabilistic skyline queries are answered in two aspects including defining the interesting probabilistic skyline tuples to users and efficiently finding these tuples without enumerating all possible worlds. In addition, Li et al 41 define the distributed skyline query over interval data and propose two efficient algorithms to retrieve the skylines progressively from the distributed local sites with a highly optimized feedback framework.…”

Section: Skyline Queries Over Uncertain Data Streamsmentioning

confidence: 99%

Parallelizing uncertain skyline computation against n‐of‐N data streaming model

Liu

Ren

et al. 2018

Concurrency and Computation

Self Cite

View full text Add to dashboard Cite

The skyline query over uncertain data streams, as an important aspect of big data analysis, plays a significant role in domains such as environment monitoring, decision-making, and data mining.The skyline query over uncertain data streams with sliding window model always focuses on the most recent N streaming items, which cannot meet the query requirements of different window scales at the same time. To improve the query flexibility and efficiency, we propose an efficient parallel method for processing uncertain n-of-N skyline queries; that is, computing the skyline for the most recent n (∀n ≤ N) items in parallel. Specifically, we first propose a framework for parallelizing the query computation for uncertain n-of-N skylines. Furthermore, we put forward a sliding window partitioning strategy as well as a streaming items mapping strategy to realize the load balance for each node. In addition, we define a spatial index structure RST based on R-tree to organize the elements within each individual sliding window and candidate set in each which can significantly improve the dominance tests. Most importantly, we provide an encoding interval scheme to transform the n-of-N query into stabbing query in each compute node, which can greatly minimize the query scope and improve the query efficiency. In addition, we use a red-black tree named RBI to store all stabbing intervals. Extensive experimental results demonstrate that the proposals are efficient and can greatly meet the query requirement of users in real applications. KEYWORDS data streams, n-of-N model, parallel queries, skyline queries, uncertain data INTRODUCTIONWith the fast development of computer technology and easily available network services, uncertain data query has received extensive attention in a large number of practical applications in domains like location-based service, 1 RFID network, 2 online shopping, 3 and radar detection. 4 Uncertain data is inherent in these applications due to various factors, 5 such as data randomness and incompletness, limited facilities of measuring, loss of data transmission, and interference of external environment. Moreover, uncertain data in these applications are often generated dynamically and continuously and gradually evolve into uncertain data streams. For example, in the online shopping applications, information of goods are usually updated continuously, and uncertain data such as the satisfaction scores from the feedback of the customers are collected from multiple web sites dynamically. As another example, in the application of resource detecting with radar detection and ranging, a large number of geological and oceanographic data generate continuously and are transmitted to the processing systems in real time. Therefore, it is greatly important to analyze large collections of uncertain streaming data efficiently, due to the significance of such real applications and characteristics of uncertain data streams such as real-time arriving, data uncertainty, and single-pass scanning.The skyline query is a typical que...

show abstract

Efficient skyline computation over distributed interval data

Cited by 6 publications

References 53 publications

Efficient Skyline Computation on Uncertain Dimensions

Efficient Skyline Computation on Uncertain Dimensions

Analyses of Indexing Techniques on Uncertain Data With High Dimensionality

Parallelizing uncertain skyline computation against n‐of‐N data streaming model

Contact Info

Product

Resources

About