A statistical parser for Czech

Collins, Michael; Hajič, Jan; Ramshaw, Lance; Tillmann, Christoph

doi:10.3115/1034678.1034754

Cited by 101 publications

(87 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…These filtering criteria are discussed in more detail in the experimental sections. The remaining set of projected trees becomes the treebank that will be used to train a new dependency parser -we conduct our experiments using a version of the Collins parser that has been adapted for dependency treebanks (Collins et al 1999). Once trained, the new parser is ready to generate dependency analyses for unseen new sentences in that language.…”

Section: Our Projection Framework For Bootstrapping Parsersmentioning

confidence: 99%

Bootstrapping parsers via syntactic projection across parallel texts

et al. 2005

View full text Add to dashboard Cite

Broad coverage, high quality parsers are available for only a handful of languages. A prerequisite for developing broad coverage parsers for more languages is the annotation of text with the desired linguistic representations (also known as "treebanking"). However, syntactic annotation is a labor intensive and time-consuming process, and it is difficult to find linguistically annotated text in sufficient quantities. In this article, we explore using parallel text to help solving the problem of creating syntactic annotation in more languages. The central idea is to annotate the English side of a parallel corpus, project the analysis to the second language, and then train a stochastic analyzer on the resulting noisy annotations. We discuss our background assumptions, describe an initial study on the "projectability" of syntactic relations, and then present two experiments in which stochastic parsers are developed with minimal human intervention via projection from English.

show abstract

Section: Our Projection Framework For Bootstrapping Parsersmentioning

confidence: 99%

Bootstrapping parsers via syntactic projection across parallel texts

et al. 2005

View full text Add to dashboard Cite

show abstract

“…First, Czech is a "highly inflected" language: the role of function words in the Germanic and Romance languages is typically filled by suffixes in Czech. Second, Czech exhibits a "relatively free word order" [7]. Since a great deal of the POS information exploited by an HMM tagger is contained in sequences of function words 12 , these features of Czech hinder the performance of an HMM POS tagger.…”

Section: Single-source Taggersmentioning

confidence: 99%

Automatically Inducing a Part-of-Speech Tagger by Projecting from Multiple Source Languages Across Aligned Corpora

Fossum

Abney

2005

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. We implement a variant of the algorithm described by Yarowsky and Ngai in [21] to induce an HMM POS tagger for an arbitrary target language using only an existing POS tagger for a source language and an unannotated parallel corpus between the source and target languages. We extend this work by projecting from multiple source languages onto a single target language. We hypothesize that systematic transfer errors from differing source languages will cancel out, improving the quality of bootstrapped resources in the target language. Our experiments confirm the hypothesis. Each experiment compares three cases: (a) source data comes from a single language A, (b) source data comes from a single language B, and (c) source data comes from both A and B, but half as much from each. Apart from the source language, other conditions are held constant in all three cases -including the total amount of source data used. The null hypothesis is that performance in the mixed case would be an average of performance in the single-language cases, but in fact, mixed-case performance always exceeds the maximum of the single-language cases. We observed this effect in all six experiments we ran, involving three different source-language pairs and two different target languages.

show abstract

“…A Type-III tree will be built by using the part-of-speech of the visited node x as the root, connecting the produced sub-tree tmp to the root. If the child of the visited node y does not have any children, a Type-II tree will be built instead (line [20][21]. Figure 7 shows an example of DIG elementary trees extracted from the annotated-tree text "I ate boiled rice with my friend".…”

Section: Extracting Elementary Trees From the Treebankmentioning

confidence: 99%

“…Of course, a rich set of training data and accurate knowledge are crucial for this method. Various methods have been proposed for the learning part of this approach: learning actions of a deterministic parser [18], [19], learning similarity of tree structures [20], [21], and learning the scores of dependencies [22]- [24].…”

Section: Linesmentioning

confidence: 99%

Dependency Parsing with Lattice Structures for Resource-Poor Languages

Sudprasert

Kawtrakul

Boitet

et al. 2009

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

SUMMARYIn this paper, we present a new dependency parsing method for languages which have very small annotated corpus and for which methods of segmentation and morphological analysis producing a unique (automatically disambiguated) result are very unreliable. Our method works on a morphosyntactic lattice factorizing all possible segmentation and part-of-speech tagging results. The quality of the input to syntactic analysis is hence much better than that of an unreliable unique sequence of lemmatized and tagged words. We propose an adaptation of Eisner's algorithm for finding the k-best dependency trees in a morphosyntactic lattice structure encoding multiple results of morphosyntactic analysis. Moreover, we present how to use Dependency Insertion Grammar in order to adjust the scores and filter out invalid trees, the use of language model to rescore the parse trees and the k-best extension of our parsing model. The highest parsing accuracy reported in this paper is 74.32% which represents a 6.31% improvement compared to the model taking the input from the unreliable morphosyntactic analysis tools.

show abstract

A statistical parser for Czech

Cited by 101 publications

References 8 publications

Bootstrapping parsers via syntactic projection across parallel texts

Bootstrapping parsers via syntactic projection across parallel texts

Automatically Inducing a Part-of-Speech Tagger by Projecting from Multiple Source Languages Across Aligned Corpora

Dependency Parsing with Lattice Structures for Resource-Poor Languages

Contact Info

Product

Resources

About