We consider the information extraction framework known as document spanners and study the problem of efficiently computing the results of the extraction from an input document, where the extraction task is described as a sequential variable-set automaton (VA). We pose this problem in the setting of enumeration algorithms, where we can first run a preprocessing phase and must then produce the results with a small delay between any two consecutive results. Our goal is to have an algorithm that is tractable in combined complexity, i.e., in the sizes of the input document and the VA, while ensuring the best possible data complexity bounds in the input document size, i.e., constant delay in the document size. Several recent works at PODS’18 proposed such algorithms but with linear delay in the document size or with an exponential dependency in size of the (generally nondeterministic) input VA. In particular, Florenzano et al. suggest that our desired runtime guarantees cannot be met for general sequential VAs. We refute this and show that, given a nondeterministic sequential VA and an input document, we can enumerate the mappings of the VA on the document with the following bounds: the preprocessing is linear in the document size and polynomial in the size of the VA, and the delay is independent of the document and polynomial in the size of the VA. The resulting algorithm thus achieves tractability in combined complexity and the best possible data complexity bounds. Moreover, it is rather easy to describe, particularly for the restricted case of so-called extended VAs. Finally, we evaluate our algorithm empirically using a prototype implementation.
We give an algorithm to enumerate the results on trees of monadic second-order (MSO) queries represented by nondeterministic tree automata. After linear time preprocessing (in the input tree), we can enumerate answers with linear delay (in each answer). We allow updates on the tree to take place at any time, and we can then restart the enumeration after logarithmic time in the tree. Further, all our combined complexities are polynomial in the automaton.Our result follows our previous circuit-based enumeration algorithms based on deterministic tree automata, and is also inspired by our earlier result on words and nondeterministic sequential extended variable-set automata in the context of document spanners. We extend these results and combine them with a recent tree balancing scheme by Niewerth, so that our enumeration structure supports updates to the underlying tree in logarithmic time (with leaf insertions, leaf deletions, and node relabelings). Our result implies that, for MSO queries with free first-order variables, we can enumerate the results with linear preprocessing and constantdelay and update the underlying tree in logarithmic time, which improves on several known results for words and trees.Building on lower bounds from data structure research, we also show unconditionally that up to a doubly logarithmic factor the update time of our algorithm is optimal. Thus, unlike other settings, there can be no algorithm with constant update time.
Abiteboul et al. initiated the systematic study of distributed XML documents consisting of several logical parts, possibly located on different machines. The physical distribution of such documents immediately raises the following question: how can a global schema for the distributed document be broken up into local schemas for the different logical parts? The desired set of local schemas should guarantee that, if each logical part satisfies its local schema, then the distributed document satisfies the global schema.Abiteboul et al. proposed three levels of desirability for local schemas: local typing, maximal local typing, and perfect local typing. Immediate algorithmic questions are: (i) given a typing, determine whether it is local, maximal local, or perfect, and (ii) given a document and a schema, establish whether a (maximal) local or perfect typing exists. This paper improves the open complexity results in their work and initiates the study of (i) and (ii) for schema restrictions arising from the current standards: DTDs and XML Schemas with deterministic content models. The most striking result is that these restrictions yield tractable complexities for the perfect typing problem.Furthermore, an open problem in Formal Language Theory is settled: deciding language primality for deterministic finite automata is pspace-complete.
The downward and upward closures of a regular language L are obtained by collecting all the subwords and superwords of its elements, respectively. The downward and upward interiors of L are obtained dually by collecting words having all their subwords and superwords in L, respectively. We provide lower and upper bounds on the size of the smallest automata recognizing these closures and interiors. We also consider the computational complexity of decision problems for closures of regular languages.
BonXai is a versatile schema specification language expressively equivalent to XML Schema. It is not intended as a replacement for XML Schema but it can serve as an additional, user-friendly front-end. It offers a simple way and a lightweight syntax to specify the context of elements based on regular expressions rather than on types. In this demo we show the front-end capabilities of BonXai and exemplify its potential to offer a novel way to view existing XML Schema Definitions. In particular, we present several usage scenarios specifically targeted to showcase the ease of specifying, modifying, and understanding XML Schema Definitions through BonXai.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.