We explore the concept of hybrid grammars, which formalize and generalize a range of existing frameworks for dealing with discontinuous syntactic structures. Covered are both discontinuous phrase structures and non-projective dependency structures. Technically, hybrid grammars are related to synchronous grammars, where one grammar component generates linear structures and another generates hierarchical structures. By coupling lexical elements of both components together, discontinuous structures result. Several types of hybrid grammars are characterized. We also discuss grammar induction from treebanks. The main advantage over existing frameworks is the ability of hybrid grammars to separate discontinuity of the desired structures from time complexity of parsing. This permits exploration of a large variety of parsing algorithms for discontinuous structures, with different properties. This is confirmed by the reported experimental results, which show a wide variety of running time, accuracy, and frequency of parse failures.
We present new experiments that transfer techniques from Probabilistic Context-free Grammars with Latent Annotations (PCFG-LA) to two grammar formalisms for discontinuous parsing: linear context-free rewriting systems and hybrid grammars. In particular, Dirichlet priors during EM training, ensemble models, and a new nonterminal scheme for hybrid grammars are evaluated. We find that our grammars are more accurate than previous approaches based on discontinuous grammar formalisms and early instances of the discriminative models but inferior to recent discriminative parsers. 1
We develop the concept of weighted aligned hypergraph bimorphism where the weights may, in particular, represent probabilities. Such a bimorphism consists of an R ≥0 -weighted regular tree grammar, two hypergraph algebras that interpret the generated trees, and a family of alignments between the two interpretations. Semantically, this yields a set of bihypergraphs each consisting of two hypergraphs and an explicit alignment between them; e.g., discontinuous phrase structures and nonprojective dependency structures are bihypergraphs. We present an EM-training algorithm which takes a corpus of bihypergraphs and an aligned hypergraph bimorphism as input and generates a sequence of weight assignments which converges to a local maximum or saddle point of the likelihood function of the corpus.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.