Abstract. In response to queries asked to a statistical database, the query system should avoid releasing summary statistics that could lead to the disclosure of confidential individual data. Attacks to the security of a statistical database may be direct or indirect, and in order to repel them, the query system should audit queries by controlling the amount of information released by their responses. The paper focuses on sumqueries with a response variable of nonnegative real type and proposes a compact representation of answered sum-queries, called an information model in "normal form", which allows the query system to decide whether the value of a new sum-query can or cannot be safely answered. If it cannot, then the query system will issue the range of feasible values of the new sum-query consistent with previously answered sum-queries. Both the management of the information model and the answering procedure require solving linear-programming problems and, since standard linear-programming algorithms are not polynomially bounded (despite their good performances in practice), effective procedures that make a parsimonious use of them are stated for the general case. Moreover, in the special case that the information model is "graphical", then it is shown that the answering procedure can be implemented in polynomial time.
2
IntroductionA statistical database (SDB) [1] is an ordinary database that contains information on individuals (persons, households, companies, organisations etc.), but its users are allowed only to ask for summary statistics over groups of individuals, possibly for on-line analytic processing (OLAP) purposes. For example, consider an SDB containing a relation name Personnel with scheme {NAME, SSN, GENDER, AGE, SALARY}. The users of the SDB can ask for summary statistics on the attribute SALARY (using aggregate functions such as sum, average, max and min) for groups of employees which at the conceptual level are specified by predicates involving the attributes GENDER, AGE and DEPARTMENT but not NAME and SSN which are private attributes. In this paper, we focus on queries such as