Our goal is to measure the likelihood of the satisfaction of numerical
constraints in the absence of prior information. We study expressive
constraints, involving arithmetic and complex numerical functions, and
even quantification over numbers. Such problems arise in processing
incomplete data, or analyzing conditions in programs without a priori
bounds on variables. We show that for constraints on n variables,
the proper way to define such a measure is as the limit of the part of
the n-dimensional ball that consists of points satisfying the
constraints, when the radius increases. We prove that the existence of
such a limit is closely related to the notion of o-minimality from
model theory. Thus, for constraints definable with the usual
arithmetic and exponentiation, the likelihood is well defined, but
adding trigonometric functions is problematic. We look at computing
and approximating such likelihoods for order and linear
constraints, and prove an impossibility result for approximating with
multiplicative error. However, as the
likelihood is a number between 0 and 1, an approximation scheme with
additive error is acceptable, and we give it for arbitrary
linear constraints.
The standard notion of query answering over incomplete database is that of certain answers, guaranteeing correctness regardless of how incomplete data is interpreted. In majority of real-life databases, relations have numerical columns and queries use arithmetic and comparisons. Even though the notion of certain answers still applies, we explain that it becomes much more problematic in situations when missing data occurs in numerical columns. We propose a new general framework that allows us to assign a measure of certainty to query answers. We test it in the agnostic scenario where we do not have prior information about values of numerical attributes, similarly to the predominant approach in handling incomplete data which assumes that each null can be interpreted as an arbitrary value of the domain. The key technical challenge is the lack of a uniform distribution over the entire domain of numerical attributes, such as real numbers. We overcome this by associating the measure of certainty with the asymptotic behavior of volumes of some subsets of the Euclidean space. We show that this measure is well-defined, and describe approaches to computing and approximating it. While it can be computationally hard, or result in an irrational number, even for simple constraints, we produce polynomial-time randomized approximation schemes with multiplicative guarantees for conjunctive queries, and with additive guarantees for arbitrary first-order queries. We also describe a set of experimental results to confirm the feasibility of this approach.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.