Abstract. Modern software systems place a large emphasis on heterogeneous communication. For disparate applications to communicate effectively, a generic theory of data is required that works at the interapplication level. The key feature of such a theory is full generality, where the data model of an application is not restricted to any particular modeling formalism. Existing solutions do not have this property: while any data can be encoded in terms of XML or using the Semantic Web, such representations provide only basic generality, whereby to reason about an arbitrary application's data model it must be re-expressed using the formalism in question. In this paper we present a theory of data which is fully generic and utilizes an extensible design to allow the underlying formalisms to be incorporated into a specification only when necessary. We then show how this theory can be used to investigate two common data equivalence problems -canonicalization and transformation -independently of the datatypes involved.