Regular Expressions with Counting: Weak versus Strong Determinism

Gelade, Wouter; Gyssens, Marc; Martens, Wim

doi:10.1007/978-3-642-03816-7_32

Cited by 22 publications

(35 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Regular expressions with numerical occurrence indicators have been investigated in the context of XML schema languages [15,14,22,23,29,28] since they are a part of the W3C XML Schema Language [21]. One of our PTIME upper bounds builds directly on Kilpeläinen and Tuhkonen's algorithm for membership testing of a regular expression with numerical occurrence indicators [28].…”

Section: Related Work and Further Literaturementioning

confidence: 99%

The complexity of evaluating path expressions in SPARQL

Losemann

Martens

2012

Proceedings of the 31st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems

Self Cite

View full text Add to dashboard Cite

The World Wide Web Consortium (W3C) recently introduced property paths in SPARQL 1.1, a query language for RDF data. Property paths allow SPARQL queries to evaluate regular expressions over graph data. However, they differ from standard regular expressions in several notable aspects. For example, they have a limited form of negation, they have numerical occurrence indicators as syntactic sugar, and their semantics on graphs is defined in a non-standard manner.We formalize the W3C semantics of property paths and investigate various query evaluation problems on graphs. More specifically, let x and y be two nodes in an edge-labeled graph and r be an expression. We study the complexities of (1) deciding whether there exists a path from x to y that matches r and (2) counting how many paths from x to y match r. Our main results show that, compared to an alternative semantics of regular expressions on graphs, the complexity of (1) and (2) under W3C semantics is significantly higher. Whereas the alternative semantics remains in polynomial time for large fragments of expressions, the W3C semantics makes problems (1) and (2) intractable almost immediately.As a side-result, we prove that the membership problem for regular expressions with numerical occurrence indicators and negation is in polynomial time.

show abstract

Section: Related Work and Further Literaturementioning

confidence: 99%

The complexity of evaluating path expressions in SPARQL

Losemann

Martens

2012

Proceedings of the 31st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems

Self Cite

View full text Add to dashboard Cite

show abstract

“…Regular expressions with numerical occurrence indicators have been investigated in the context of XML schema languages [Colazzo et al 2009b[Colazzo et al , 2009aGelade et al 2012Gelade et al , 2009Kilpeläinen andTuhkanen 2007, 2003] since they are a part of the W3C XML Schema Language [Gao et al 2009]. One of our polynomial time upper bounds builds directly on Kilpeläinen and Tuhkonen's algorithm for membership testing of a regular expression with numerical occurrence indicators [Kilpeläinen and Tuhkanen 2003].…”

Section: Related Work and Furthermentioning

confidence: 96%

The complexity of regular expressions and property paths in SPARQL

Losemann

Martens

2013

ACM Trans. Database Syst.

Self Cite

View full text Add to dashboard Cite

The World Wide Web Consortium (W3C) recently introduced property paths in SPARQL 1.1, a query language for RDF data. Property paths allow SPARQL queries to evaluate regular expressions over graph-structured data. However, they differ from standard regular expressions in several notable aspects. For example, they have a limited form of negation, they have numerical occurrence indicators as syntactic sugar, and their semantics on graphs is defined in a nonstandard manner.We formalize the W3C semantics of property paths and investigate various query evaluation problems on graphs. More specifically, let x and y be two nodes in an edge-labeled graph and r be an expression. We study the complexities of: (1) deciding whether there exists a path from x to y that matches r and (2) counting how many paths from x to y match r. Our main results show that, compared to an alternative semantics of regular expressions on graphs, the complexity of (1) and (2) under W3C semantics is significantly higher. Whereas the alternative semantics remains in polynomial time for large fragments of expressions, the W3C semantics makes problems (1) and (2) intractable almost immediately.As a side-result, we prove that the membership problem for regular expressions with numerical occurrence indicators and negation is in polynomial time.

show abstract

“…Indeed, several people want to abandon the notion as its only reason for existence is to ensure compatibility with SGML parsers and, furthermore, because it is not a transparent one for the average user which is witnessed by several practical studies [8,16] that found a number of nondeterministic content models in actual DTDs. In practice, XSDs allow numerical occurrence constraints in their regular expressions, which complicates the definition of determinism even more [26,19]. In fact, Van der Vlist notes that Clarke and Murata already abandoned the notion in their Relax NG specification [54], the most serious competitor for XML Schema.…”

Section: Factormentioning

confidence: 96%

Complexity of Decision Problems for XML Schemas and Chain Regular Expressions

Martens¹,

Neven²,

Schwentick³

2010

SIAM J. Comput.

Self Cite

View full text Add to dashboard Cite

We study the complexity of the inclusion, equivalence, and intersection problem of extended chain regular expressions (eCHAREs). These are regular expressions with a very simple structure: they basically consist of the concatenation of factors, where each factor is a disjunction of strings, possibly extended with " * ", "+", or "?". Though of a very simple form, the usage of such expressions is widespread as eCHAREs, for instance, constitute a super class of the regular expressions most frequently used in practice in schema languages for XML. In particular, we show that all our lower and upper bounds for the inclusion and equivalence problem carry over to the corresponding decision problems for extended context-free grammars, and to single-type and restrained competition tree grammars. These grammars form abstractions of document type definitions (DTDs), XML schema definitions (XSDs) and the class of one-pass preorder typeable XML Schemas, respectively. For the intersection problem, we show that obtained complexities only carry over to DTDs. In this respect, we also study two other classes of regular expressions related to XML: deterministic expressions and expressions where the number of occurrences of alphabet symbols is bounded by a constant. Introduction.Although the complexity of the basic decision problems for regular expressions (inclusion, equivalence, and nonemptiness of intersection) have been studied in depth in the 1970's [25, 28, 51], the fragment of (extended) chain regular expressions (eCHAREs) has been largely untreated and is motivated by the much more recent rise of XML theory [29,42,45,55]. Although our initial motivation to study this particular fragment stems from our interest in schema languages for XML, simple regular expressions like eCHAREs also occur outside the realm of XML. For instance, eCHAREs are a superset of the sequence motifs used in bioinformatics [39] and are used in verification of lossy channel systems [1], where they appear as factors of simple regular expressions. 1 The presentation of the paper is therefore split up into two parts. The first part considers the complexity of the basic decision problems for eCHAREs and two other classes. The second part shows how results on decision problems for regular expressions can be lifted to corresponding results on decision problems for XML schema languages.

show abstract

Regular Expressions with Counting: Weak versus Strong Determinism

Cited by 22 publications

References 19 publications

The complexity of evaluating path expressions in SPARQL

The complexity of evaluating path expressions in SPARQL

The complexity of regular expressions and property paths in SPARQL

Complexity of Decision Problems for XML Schemas and Chain Regular Expressions

Contact Info

Product

Resources

About