Document spanners are a formal framework for information extraction that was introduced by Fagin, Kimelfeld, Reiss, and Vansummeren (PODS 2013, JACM 2015). One of the central models in this framework are core spanners, which formalize the query language AQL that is used in IBM's SystemT. As shown by Freydenberger and Holldack (ICDT 2016, ToCS 2018, there is a connection between core spanners and EC reg , the existential theory of concatenation with regular constraints. The present paper further develops this connection by defining SpLog, a fragment of EC reg that has the same expressive power as core spanners. This equivalence extends beyond equivalence of expressive power, as we show the existence of polynomial time conversions between SpLog and core spanners. Consequences and applications include an alternative way of defining relations for spanners, a pumping lemma for core spanners, and insights into the relative succinctness of various classes of spanner representations and their connection to graph querying languages. We also briefly discuss the connection between SpLog with negation and core spanners with a difference operator.Theory Comput Syst 2 More specifically, the distinction between these two definitions is only meaningful when dealing with constraints on variables that do not occur in word equations (like in formulas that consist only of constraint symbols). From an EC reg point of view, this are possible (although not of particular importance); but for spanners, these are not relevant.
Theory Comput SystExample 2.1 Consider the EC-formula ϕ 1 (x, y, z) := ∃x,ŷ : (x = zx ∧ y = zŷ) and the EC reg -formula ϕ 2 (x, y, z) := ∃x,ŷ : (x = zx ∧ y = zŷ ∧ C + (z)) . Then σ |= ϕ 1 if and only if σ (x) and σ (y) have σ (z) as common prefix. If, in addition to this, σ (z) = ε, then σ |= ϕ 2 .Word equations and EC have the same expressive power (cf. Choffrut and Karhumäki [6] or Karhumäki, Mignosi, and Plandowski [30]). More formally, for every EC-formula ϕ, one can construct a word equation η with var(η) ⊇ free(ϕ), such that σ |= ϕ if and only if there is a σ with σ |= η and σ (x) = σ (x) for all x ∈ free(ϕ). This can directly be extended to convert any EC reg -formula into a word equation with constraints (cf. Diekert [10]). For conjunctions, the construction is easily explained: Choose distinct a, b ∈ . Hmelevskii's pattern pairing function is defined by α, β := αaβαbβ. Then (α L = α R ) ∧ (β L = β R ) holds if and only if α L , β L = α R , β R . This follows from a simple length argument, where the terminals a and b act as "barriers" that prevent unintended equalities (see Section 5.3 of [6] for details). The construction for disjunctions is similar, but it is also more involved and introduces new variables. Furthermore, converting alternating disjunctions and conjunctions may increase the size exponentially.
Document Spanners
Spanners and Primitive Spanner RepresentationsLet w := a 1 a 2 · · · a n be a word over , with n ≥ 0 and a 1 , . . . , a n ∈ . A span of w is an interval [i, j with 1 ≤ i ≤ j ≤ n + 1...