We present a novel framework for uncertain data management. We start with a database whose tuple correctness is uncertain and an oracle that can resolve the uncertainty, i.e., decide if a tuple is correct or not. Such an oracle may correspond, e.g., to a data expert or to a crowdsourcing platform. We wish to use the oracle to clean the database with the goal of ensuring the correct answer for specific mission-critical queries. To avoid the prohibitive cost of cleaning the entire database and to minimize the expected number of calls to the oracle, we must carefully select tuples whose resolution would suffice to resolve the uncertainty in query results. In other words, we need a query-guided process for the resolution of uncertain data. We develop an end-to-end solution to this problem, based on the derivation of query answers and on correctness probabilities for the uncertain data. At a high level, we first track Boolean provenance to identify which input tuples contribute to the derivation of each output tuple, and in what ways. We then design an active learning solution for iteratively choosing tuples to resolve, based on the provenance structure and on an evolving estimation of tuple correctness probabilities. We conduct an extensive experimental study to validate our framework in different use cases.
We present a novel framework for uncertain data management, called ActivePDB. We are given a relational probabilistic database, where each tuple is correct with some probability; e.g., a database constructed from textual data using information extraction. We are now given a query and we want to determine the correctness of its results. Unlike probabilistic databases, we have an oracle that can resolve the uncertainty, such as a domain expert that can verify data against their sources. Since verification may be costly, our goal is to determine the correct output of the query, while asking the oracle to verify as few tuples as possible. ActivePDB provides an end-to-end solution to this problem. In a nutshell, we first track provenance to identify which input tuples contribute to the derivation of each output tuple, and in what ways. We then design an active learning solution to iteratively choose tuples to be verified based on the provenance structure and on an evolving estimation of the probability of the tuples correctness. We will demonstrate ActivePDB in the context of the NELL database of extracted facts, allowing participants to both pose queries and play the role of oracles.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.