Abstract:We develop new algorithms for learning monadic node selection queries in unranked trees from annotated examples, and apply them to visually interactive Web information extraction.We propose to represent monadic queries by bottom-up deterministic Node Selecting Tree Transducers (NSTTs), a particular class of tree automata that we introduce. We prove that deterministic NSTTs capture the class of queries definable in monadic second order logic (MSO) in trees, which Gottlob and Koch (2002) argue to have the right … Show more
“…The reason for ignoring schema information may be that it cannot be integrated into most approaches. Tree automata based techniques for the inference of regular tree languages are the exception [4,15], as we show in this article, but it requires considerable effort. Automata for local tree languages are not sufficient [19,13].…”
Section: Introductionmentioning
confidence: 98%
“…These range from classification [11,12,16], conditional random fields [14], inductive logic programming [7], to tree automata induction [19,13,4,15].…”
Section: Introductionmentioning
confidence: 99%
“…In this article, we introduce schema guidance into the learning algorithm for monadic queries represented by pruning node selecting tree transducers (pNSTTs) presented in [4]. These are tree automata that recognize monadic queries represented as tree languages.…”
Section: Introductionmentioning
confidence: 99%
“…An algorithm for testing functionality and an RPNI algorithm for learning NSTTs from completely annotated examples have been presented in [4].…”
Abstract. The induction of monadic node selecting queries from partially annotated XML-trees is a key task in Web information extraction. We show how to integrate schema guidance into an RPNI-based learning algorithm, in which monadic queries are represented by pruning node selecting tree transducers. We present experimental results on schema guidance by the DTD of HTML.
“…The reason for ignoring schema information may be that it cannot be integrated into most approaches. Tree automata based techniques for the inference of regular tree languages are the exception [4,15], as we show in this article, but it requires considerable effort. Automata for local tree languages are not sufficient [19,13].…”
Section: Introductionmentioning
confidence: 98%
“…These range from classification [11,12,16], conditional random fields [14], inductive logic programming [7], to tree automata induction [19,13,4,15].…”
Section: Introductionmentioning
confidence: 99%
“…In this article, we introduce schema guidance into the learning algorithm for monadic queries represented by pruning node selecting tree transducers (pNSTTs) presented in [4]. These are tree automata that recognize monadic queries represented as tree languages.…”
Section: Introductionmentioning
confidence: 99%
“…An algorithm for testing functionality and an RPNI algorithm for learning NSTTs from completely annotated examples have been presented in [4].…”
Abstract. The induction of monadic node selecting queries from partially annotated XML-trees is a key task in Web information extraction. We show how to integrate schema guidance into an RPNI-based learning algorithm, in which monadic queries are represented by pruning node selecting tree transducers. We present experimental results on schema guidance by the DTD of HTML.
“…For various classes of automata, this can be done in polynomial time in the size of the sample, while there exist characteristic samples of polynomial cardinality in the size of the target automaton. This approach has been established for finite deterministic automata (Dfas) [12,16], for deterministic tree automata [17], and for deterministic stepwise tree automata for unranked trees [3].…”
Abstract. Rational functions are transformations from words to words that can be defined by string transducers. Rational functions are also captured by deterministic string transducers with lookahead. We show for the first time that the class of rational functions can be learned in the limit with polynomial time and data, when represented by string transducers with lookahead in the diagonal-minimal normal form that we introduce.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.