Representing and Querying Correlated Tuples in Probabilistic Databases

Sen, Prithviraj; Deshpande, Amol

doi:10.1109/icde.2007.367905

Cited by 162 publications

(187 citation statements)

References 33 publications

Supporting

Mentioning

187

Contrasting

Order By: Relevance

“…A similar concept is used in many tuple uncertainty models to track correlations between tuples. [9] uses lineage and [14] uses factor tables to capture such dependencies. As we are interested in capturing historical dependencies between attributes of tuples, our concept of dependencies is different from this related work, which capture these dependencies on a per tuple basis.…”

Section: Historymentioning

confidence: 99%

“…The ProbView system [16] took a similar approach by propagating the formulas necessary to evaluating the resulting probabilities. Sen et al have more recently proposed an alternative approach to represent tuple correlations using probabilistic graphical models [14]. They use factored representations of the relations to represent their dependencies.…”

Section: Overhead Of Historiesmentioning

confidence: 99%

See 1 more Smart Citation

Database Support for Probabilistic Attributes and Tuples

Singh¹,

Mayfield²,

Shah³

et al. 2008

2008 IEEE 24th International Conference on Data Engineering

View full text Add to dashboard Cite

Abstract-The inherent uncertainty of data present in numerous applications such as sensor databases, text annotations, and information retrieval motivate the need to handle imprecise data at the database level. Uncertainty can be at the attribute or tuple level and is present in both continuous and discrete data domains. This paper presents a model for handling arbitrary probabilistic uncertain data (both discrete and continuous) natively at the database level. Our approach leads to a natural and efficient representation for probabilistic data. We develop a model that is consistent with possible worlds semantics and closed under basic relational operators. This is the first model that accurately and efficiently handles both continuous and discrete uncertainty. The model is implemented in a real database system (PostgreSQL) and the effectiveness and efficiency of our approach is validated experimentally.

show abstract

Section: Historymentioning

confidence: 99%

Section: Overhead Of Historiesmentioning

confidence: 99%

Database Support for Probabilistic Attributes and Tuples

Singh¹,

Mayfield²,

Shah³

et al. 2008

2008 IEEE 24th International Conference on Data Engineering

View full text Add to dashboard Cite

show abstract

“…However, because of the independence assumption, it is non-trivial to extend our exact algorithm to tackle the problem against dataset with correlations. As a possible future work, we will consider to develop efficient exact algorithm based on the graph model [38,15] which can effectively capture the correlations of the uncertain dataset.…”

Section: Discussionmentioning

confidence: 99%

“…Uncertainty is inherent in such applications due to various factors such as data randomness and incompleteness, limitation of equipment, and delay or loss in data transfer. A number of issues have been recently addressed; these include modeling uncertainty [2,36], query evaluation [10,13,14,37], indexing [11,41], top-k queries [22,35,39,42], skyline queries [34], joins [26,27], nearest neighbor query [5,9,27], clustering [28,30], etc.…”

Section: Introductionmentioning

confidence: 99%

Threshold-based probabilistic top-k dominating queries

et al. 2009

View full text Add to dashboard Cite

Recently, due to intrinsic characteristics in many underlying data sets, a number of probabilistic queries on uncertain data have been investigated. Topk dominating queries are very important in many applications including decision making in a multidimensional space. In this paper, we study the problem of efficiently computing top-k dominating queries on uncertain data. We first formally define the problem. Then, we develop an efficient, threshold-based algorithm to compute the exact solution. To overcome some inherent computational deficiency in an exact computation, we develop an efficient randomized algorithm with an accuracy guarantee. Our extensive experiments demonstrate that both algorithms are quite efficient, while the randomized algorithm is quite scalable against data set sizes, object areas, k values, etc. The randomized algorithm is also highly accurate in practice.

show abstract

“…General issues in modelling and managing uncertain data are addressed by Dey and Sarkar in [4], Lee in [11], and Antova, Koch, and Olteanu in [6]. Querying uncertain data by the probabilistic paradigm has been investigated by Dalvi and Suciu in [2] and Sen and Deshpande in [17]. Very recently Dalvi and Suciu [3] have shown that the problem of query evaluation over probabilistic databases is either P T IM E or #P -complete.…”

Section: Related Workmentioning

confidence: 99%

Probabilistic Threshold Range Aggregate Query Processing over Uncertain Data

Yang

Zhang

et al. 2009

Advances in Data and Web Management

View full text Add to dashboard Cite

Abstract. Large amount of uncertain data is inherent in many novel and important applications such as sensor data analysis and mobile data management. A probabilistic threshold range aggregate (PTRA) query retrieves summarized information about the uncertain objects satisfying a range query, with respect to a given probability threshold. This paper is the first one to address this important type of query. We develop a new index structure aU-tree and propose an exact querying algorithm based on aU-tree. For the pursue of efficiency, two techniques SingleSample and DoubleSample are developed. Both techniques provide approximate answers to a PTRA query with accuracy guarantee. Experimental study demonstrates the efficiency and effectiveness of our proposed methods.

show abstract

Representing and Querying Correlated Tuples in Probabilistic Databases

Cited by 162 publications

References 33 publications

Database Support for Probabilistic Attributes and Tuples

Database Support for Probabilistic Attributes and Tuples

Threshold-based probabilistic top-k dominating queries

Probabilistic Threshold Range Aggregate Query Processing over Uncertain Data

Contact Info

Product

Resources

About