Query equivalence is investigated for disjunctive aggregate queries with negated subgoals, constants and comparisons. A full characterization of equivalence is given for the aggregation functions count, max, sum, prod, top2 and parity. A related problem is that of determining, for a given natural number N , whether two given queries are equivalent over all databases with at most N constants. This problem is called bounded equivalence. A complete characterization of decidability of bounded equivalence is given. In particular, it is shown that this problem is decidable for all the above aggregation functions as well as for cntd (count distinct) and avg. For quasilinear queries (i.e., queries in which predicates that occur positively are not repeated), it is shown that equivalence can be decided in polynomial time for the aggregation functions count, max, sum, parity, prod, top2 and avg. A similar result holds for cntd provided that a few additional conditions hold. The results are couched in terms of abstract characteristics of aggregation functions, and new proof techniques are used. Finally, the results above also imply that equivalence, under bag-set semantics, is decidable for nonaggregate queries with negation. • 329 • S. Cohen et al.quasilinear queries, equivalence boils down to isomorphism, which can be decided in polynomial time.
AGGREGATION FUNCTIONSAn aggregate query is executed in two steps. First, data is collected from a database as specified by the nonaggregate part of the query. Then, the results are grouped into multisets (or bags), an aggregation function is applied to the multisets, and the aggregates are returned as answers.The queries that we consider in this article contain the aggregation functions count and cntd, which for a bag return the number of elements and distinct elements, respectively; parity, which returns 0 or 1, depending on whether the number of elements in the bag is even or odd; sum, prod and avg, which return the sum, product and average of the elements of a bag, respectively; max, which returns the maximum among the elements of a bag; and top2, which returns a pair consisting of the two greatest different elements of a bag.The reader will notice in the course of the article that our results for max and top2 immediately carry over to min and bot2, which select the minimum and the two least elements out of a multiset of numbers, respectively. Moreover, our results for top2 can easily be generalized to the function topK , which selects the K greatest different elements.Our arguments to prove decidability of equivalence for certain classes of aggregate queries rely on the fact that the aggregation functions take values in special kinds of abelian monoids and are defined in terms of the operations of those monoids. To make this formal, we will introduce the class of monoid aggregation functions and two of its subclasses. We will show that all of the above functions, except cntd, prod and avg, belong to one of these two subclasses. In general, an aggregation function maps multisets...