Link to this article: http://journals.cambridge.org/abstract_S0960129511000740How to cite this article: MELITA HAJDINJAK and GAVIN BIERMAN (2012). Extending relational algebra with similarities.In this paper we propose various extensions to the relational model to support similarity-based querying. We build upon the K-relation model, where tuples are assigned values from an arbitrary semiring K, and its associated positive relational algebra RA + K . We consider a recently proposed extension to RA + K using a monus operation on the semiring to support negative queries, and show how, surprisingly, it fails for important 'fuzzy' semirings. Instead, we suggest using a negation operator. We also consider the identities satisfied by the relational algebra RA + K . We show that moving from a semiring to a particular form of lattice (a De Morgan frame) yields a relational algebra that satisfies all the classical (positive) relational algebra identities. We claim that to support real-world similarity queries realistically, one must move from tuple-level annotations to attribute-level annotations. We show in detail how our De Morgan frame-based model can be extended to support attribute-level annotations and give worked examples of similarity queries in this setting.Extending relational algebra with similarities 689 scenario of a database representing bus connections in a city. Finally, in Section 7, we give our conclusions, consider some related work and describe some plans for future work.
Background: the K-relation modelIn this section we recall the definitions of K-relations and the positive relational algebra RA + K , along with its extension RA + K (\) to support negative queries. The aim of the K-relation work was to provide a generalised framework capable of capturing various forms of annotated relations. As similarity can clearly be viewed as a form of annotation, we will use this as our foundation for a model of similarity-based querying.2.1. Positive relational algebra RA + K We first assume some base domains, or types, commonly written as τ, which are simply sets of ground values. We use in our examples common base types such as integers and strings. We adopt the named-attribute approach, so in our model a schema U, which is written {a 1 : τ 1 , . . . , a n : τ n }, is a finite map from attribute names a i to their types or domains U(a i ) = τ i . We represent a U-tuple as a map t = {a 1 : v 1 , . . . , a n : v n } from attribute namesWe denote the set of all U-tuples by U-Tup.Recall that a semiring K = (K, ⊕, , 0, 1) is an algebraic structure with two binary operations (sum ⊕ and product ) and two distinguished elements (0 = 1) such that (K, ⊕, 0) is a commutative monoid † with identity element 0, (K, , 1) is a monoid with identity element 1, products distribute over sums and 0 a = a 0 = 0 for any a ∈ K (that is, 0 is an annihilating element). A semiring K is said to be commutative if monoid (K, , 1) is commutative. et al. 2007)). Let K = (K, ⊕, , 0, 1) be a commutative semiring. A K-relation over a schema U ...