Simplifying XML Schema: Single-type approximations of regular tree languages

Gelade, Wouter; Idziaszek, Tomasz; Martens, Wim; Neven, Frank; Paredaens, Jan

doi:10.1016/j.jcss.2013.01.009

Cited by 9 publications

(8 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…On the other hand, one could wonder how twig-queries can be extended while remaining within EXPTIME for testing twig-definability. When an NSTA is not equivalent to a twig, one could look at maximal sub-or minimal super-approximations, as, for instance, done in [15] for single-type EDTDs. Of course, other languages than XPath can be considered, like for instance, the Region Algebra [13], caterpillar expressions [16], or even tree-walking automata [5].…”

Section: Resultsmentioning

confidence: 99%

See 1 more Smart Citation

Deciding Twig-definability of Node Selecting Tree Automata

2015

Self Cite

View full text Add to dashboard Cite

Node selecting tree automata (NSTAs) constitute a general formalism defining unary queries over trees. Basically, a node is selected by an NSTA when it is visited in a selecting state during an accepting run. We consider twig patterns as an abstraction of XPath. Since the queries definable by NSTAs form a strict superset of twig-definable queries, we study the complexity of the problem to decide whether the query by a given NSTA is twig-definable. In particular, we obtain that the latter problem is EXPTIME-complete. In addition, we show that it is also EXPTIME-complete to decide whether the query by a given NSTA is definable by a node selecting string automaton.

show abstract

Section: Resultsmentioning

confidence: 99%

“…the classes of unary queries they define. Indeed, a given NFA can be converted into an equivalent DFA, which can then be directly used to specify an equivalent single-type EDTD through its characterization as a DFA-based DTD [15,20].…”

Section: Single-type Edtdsmentioning

confidence: 99%

Deciding Twig-definability of Node Selecting Tree Automata

2015

Self Cite

View full text Add to dashboard Cite

show abstract

“…The following result readily follows from the standard product construction of automata (see, e.g., [18]). We add the observation that, if the input EDTDs are EDTD un s, then the product EDTDs for the union and intersection are also EDTD un s.…”

Section: Unambiguous Edtdsmentioning

confidence: 90%

“…In general, the latter schema can not be equivalent but, hopefully, constitutes a best approximation in some well-defined way. The latter approach was taken in [18]. We later refer to this setting as the approximation scenario.…”

Section: Introductionmentioning

confidence: 99%

Generating, Sampling and Counting Subclasses of Regular Tree Languages

et al. 2012

Self Cite

View full text Add to dashboard Cite

To experimentally validate learning and approximation algorithms for XML Schema Definitions (XSDs), we need algorithms to generate uniformly at random a corpus of XSDs as well as a similarity measure to compare how close the generated XSD resembles the target schema. In this paper, we provide the formal foundation for such a testbed. We adopt similarity measures based on counting the number of common and different trees in the two languages, and we develop the necessary machinery for computing them. We use the formalism of extended DTDs (EDTDs) to represent the unranked regular tree languages. In particular, we obtain an efficient algorithm to count the number of trees up to a certain size in an unambiguous EDTD. The latter class of unambiguous EDTDs encompasses the more familiar classes of single-type, restrained competition and bottom-up deterministic EDTDs. The single-type EDTDs correspond precisely to the core of XML Schema, while the others are strictly more expressive. We also show how constraints on the shape of allowed trees can be incorporated. As we make use of a translation into a well-known formalism for combinatorial specifications, we get for free a sampling procedure to draw members of any unambiguous EDTD. When dropping the restriction to unambiguous EDTDs, i.e. taking the full class of EDTDs into account, we show that the counting problem becomes #P-complete and provide an approximation algorithm. Finally, we discuss uniform generation of * We acknowledge the financial support of the Future and Emerging Technologies (FET) programme within the Seventh Framework Programme for Research of the European Commission, under the FET-Open grant agreement FOX, number FP7-ICT-233599.Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. single-type EDTDs, i.e., the formal abstraction of XSDs. To this end, we provide an algorithm to generate k-occurrence automata (k-OAs) uniformly at random and show how this leads to uniform generation of single-type EDTDs.

show abstract

“…[8,10,20,22,29,31], and the work on key approximation in [21]). These works complement our work in two senses: first, we can use the inferred schemas as inputs; second, our results can be used to measure the quality of inferred schemas, based on the quality of the optimal generator conforming to them.…”

Section: Related Workmentioning

confidence: 99%

Finding optimal probabilistic generators for XML collections

Abiteboul

Amsterdamer

Deutch

et al. 2012

Proceedings of the 15th International Conference on Database Theory

View full text Add to dashboard Cite

We study the problem of, given a corpus of XML documents and its schema, finding an optimal (generative) probabilistic model, where optimality here means maximizing the likelihood of the particular corpus to be generated. Focusing first on the structure of documents, we present an efficient algorithm for finding the best generative probabilistic model, in the absence of constraints. We further study the problem in the presence of integrity constraints, namely key, inclusion, and domain constraints. We study in this case two different kinds of generators. First, we consider a continuation-test generator that performs, while generating documents, tests of schema satisfiability; these tests prevent from generating a document violating the constraints but, as we will see, they are computationally expensive. We also study a restart generator that may generate an invalid document and, when this is the case, restarts and tries again. Finally, we consider the injection of data values into the structure, to obtain a full XML document. We study different approaches for generating these values.

show abstract

Simplifying XML Schema: Single-type approximations of regular tree languages

Cited by 9 publications

References 20 publications

Deciding Twig-definability of Node Selecting Tree Automata

Deciding Twig-definability of Node Selecting Tree Automata

Generating, Sampling and Counting Subclasses of Regular Tree Languages

Finding optimal probabilistic generators for XML collections

Contact Info

Product

Resources

About