We study the complexity of the inclusion, equivalence, and intersection problem of extended chain regular expressions (eCHAREs). These are regular expressions with a very simple structure: they basically consist of the concatenation of factors, where each factor is a disjunction of strings, possibly extended with " * ", "+", or "?". Though of a very simple form, the usage of such expressions is widespread as eCHAREs, for instance, constitute a super class of the regular expressions most frequently used in practice in schema languages for XML. In particular, we show that all our lower and upper bounds for the inclusion and equivalence problem carry over to the corresponding decision problems for extended context-free grammars, and to single-type and restrained competition tree grammars. These grammars form abstractions of document type definitions (DTDs), XML schema definitions (XSDs) and the class of one-pass preorder typeable XML Schemas, respectively. For the intersection problem, we show that obtained complexities only carry over to DTDs. In this respect, we also study two other classes of regular expressions related to XML: deterministic expressions and expressions where the number of occurrences of alphabet symbols is bounded by a constant.
Introduction.Although the complexity of the basic decision problems for regular expressions (inclusion, equivalence, and nonemptiness of intersection) have been studied in depth in the 1970's [25, 28, 51], the fragment of (extended) chain regular expressions (eCHAREs) has been largely untreated and is motivated by the much more recent rise of XML theory [29,42,45,55]. Although our initial motivation to study this particular fragment stems from our interest in schema languages for XML, simple regular expressions like eCHAREs also occur outside the realm of XML. For instance, eCHAREs are a superset of the sequence motifs used in bioinformatics [39] and are used in verification of lossy channel systems [1], where they appear as factors of simple regular expressions. 1 The presentation of the paper is therefore split up into two parts. The first part considers the complexity of the basic decision problems for eCHAREs and two other classes. The second part shows how results on decision problems for regular expressions can be lifted to corresponding results on decision problems for XML schema languages.