IntroductionWe introduce a first-order language with real polynomial arithmetic and aggregation operators (count, iterated sum and multi:ply), which is well suited for the definition of aggregate queries involving complex statistical functions. It offers a good trade-off between expressive power and complexi.ty, with a tractable data complexity. Interestingly, some fundamental properties of first-order with real arithmetic are preserved in the presence of aggregates. In particular, there is an effective quantifier elimination for formulae with aggregation.We consider the problem of querying data that has already been aggregated in aggregate views, and focus on queries with an aggregation over a conjunctive query. Our main conceptual contribution is the introduction of a new equivalence relation among conjunctive queries, the isomorphism modulo a product. We prove that the equivalence of aggregate queries such as for instance averages reduces to it. Deciding if two queries are isomorphic modulo a product is shown to be NP-complete. We then show that the problem of complete rewriting of count queries using count views is also NP-complete. Finally, we introduce new rewriting techniques based on the isomorphism modulo a product to recover the values of counts by complex arithmetical computation from the views. We conclude by showing how these techniques can be -used to perform automatic aggregation.The manipulation of aggregate data has gained considerable interest in recent years, for its great impact in various applications such as for instance data wareh.ous,-ing. In such applications, queries involve aggregation over evolving data of very large size. The use of ma.-terialized aggregate views, might strongly increase the efficiency of query processing.The modeling and the manipulation of statistical data have been studied with different focus both in the field of statistical databases [Su83, SW85, OOM87, Gho86, RR93, RBT96], and in the field of on-line analytical processing (OLAP) [GBLP96, HRU96, LS97, Sho97]. The real challenge of this sort of data is caused by the rather intricate semantics of summary values, that is not handled by classical database systems. A fundamental problem of statistical databases, is to determine what can be derived from the statistical data.In this paper, we present a first-order language for expressing general aggregate queries involving complex statistical functions.The language is based on real polynomial arithmetic together with aggregate operators that count, sum and multiply values in multisets. We first consider first-order logic with this signatur,e FOigg, and prove that every property that can be expressed with the aggregates can be expressed without aggregation. In other words, the logic FOR with polynomials over the reals, and its extension FOggg to aggregate functions coincide. This observation has fundamental consequences. In particular, there is an effective quantifier elimination method for formulae with real arithmetic and aggregation in FOigg .As a query language, it i...
We consider the problem of answering queries using only materialized views. We rst sho w that if the views subsume the query from the point of view of the information content, then the query can be answered using only the views, but the resulting query might be extremely ine cient. We then focus on aggregate views and queries over a single relation, whic hare fundamental in many applications such as data w arehousing. We sho w that in this case, it is possible to guarantee that as soon as the views subsume the query, i t can be completely rewritten in terms of the views in a simple query language. Our main contribution is the conception of various rewriting algorithms which run in polynomial time, and the proof of their completeness which relies on combinatorial arguments. Finally, w e discuss the choice of materializing or not ratio views such a s a verage and percentage, important for the design of materialized views. We sho w that it has an impact on the information content, whic h can be used to protect data, as w ellas on the maintenance of views.
Haplotype data play a relevant role in several genetic studies, e.g., mapping of complex disease genes, drug design, and evolutionary studies on populations. However, the experimental determination of haplotypes is expensive and time-consuming. This motivates the increasing interest in techniques for inferring haplotype data from genotypes, which can instead be obtained quickly and economically. Several such techniques are based on the maximum parsimony principle, which has been justified by both experimental results and theoretical arguments. However, the problem of haplotype inference by parsimony was shown to be NP-hard, thus limiting the applicability of exact parsimony-based techniques to relatively small data sets. In this paper, we introduce collapse rule, a generalization of the well-known Clark's rule, and describe a new heuristic algorithm for haplotype inference (implemented in a program called CollHaps), based on parsimony and the iterative application of collapse rules. The performance of CollHaps is tested on several data sets. The experiments show that CollHaps enables the user to process large data sets obtaining very "parsimonious" solutions in short processing times. They also show a correlation, especially for large data sets, between parsimony and correct reconstruction, supporting the validity of the parsimony principle to produce accurate solutions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.