While the theory of languages of words is very mature, our understanding of relations on words is still lagging behind. And yet such relations appear in many new applications such as verification of parameterized systems, querying graph-structured data, and information extraction, for instance. Classes of well-behaved relations typically used in such applications are obtained by adapting some of the equivalent definitions of regularity of words for relations, leading to non-equivalent notions of recognizable, regular, and rational relations.The goal of this paper is to propose a systematic way of defining classes of relations on words, of which these three classes are just natural examples, and to demonstrate its advantages compared to some of the standard techniques for studying word relations. The key idea is that of a synchronization of a pair of words, which is a word over an extended alphabet. Using it, we define classes of relations via classes of regular languages over a fixed alphabet, just {1, 2} for binary relations. We characterize some of the standard classes of relations on words via finiteness of parameters of synchronization languages, called shift, lag, and shiftlag. We describe these conditions in terms of the structure of cycles of graphs underlying automata, thereby showing their decidability. We show that for these classes there exist canonical synchronization languages, and every class of relations can be effectively re-synchronized using those canonical representatives. We also give sufficient conditions on synchronization languages, defined in terms of injectivity and surjectivity of their Parikh images, that guarantee closure under intersection and complement of the classes of relations they define.
ACM Subject Classification F.4.3 Formal Languages
Keywords and phrases
IntroductionFoundations of formal language theory have been largely developed in the 1960s and 1970s, and used heavily in practically all areas of computer science. The field itself stayed somewhat dormant for a while, but that changed over the past 10-15 years due to new application areas requiring techniques that could not have been foreseen 30 or 40 years earlier. Among consumers of results in formal language theory are verification (for instance, automata-based approaches to model-checking are now part of standard industrial verification tools [7,22]) and data management (standards for describing and querying XML documents, for instance, are rooted in both word and tree automata [24,28], and emerging graph data models are borrowing many formal language concepts [3]). Of interest to us in this paper are relations on words. That is, for a given finite alphabet A, we deal with binary relations R ⊆ A * × A * . Their study goes back to Elgot, Mezei, Nivat in the 1960s [15,25] with much subsequent work done later (see, e.g., surveys [8,13]). The standard notions of regularity that generate the same class of languages -recognizability by finite monoids, definability by automata, or by regular expressions-give rise to ...