Intelligent Information Processing and Web Mining 2004
DOI: 10.1007/978-3-540-39985-8_43
|View full text |Cite
|
Sign up to set email alerts
|

Trigram morphosyntactic tagger for Polish

Abstract: Abstract.We introduce an implementation of a plain trigram part-of-speech tagger which appears to work well on Polish texts. At this moment the tagger achieves 9.4% error rate, which makes it signficantly better than our previous stochastic disambiguator. Since the trigram model for Polish behaves similarly to Czech, we hope to reach Czech state-of-art error rate when the quality of the training data improves.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2005
2005
2018
2018

Publication Types

Select...
7
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 18 publications
(4 citation statements)
references
References 4 publications
0
4
0
Order By: Relevance
“…Statistical morphological disambiguation using small manually annotated training corpora looks as quite a simple task, when frequencies of grammatical features are generated during the training phase and the most likely sequence of morphological features is found in a new text by the help of various probability methods. Drawing on the experience of morphological annotation systems for other free word order languages (Dębowski, 2004;Hajič et al, 2001;Palanisamy et al, 2006 etc. ), it is obvious that the corpus-based method is most suitable for the developing such systems for Lithuanian.…”
Section: Automatic Morphological Annotation Of the Lithuanian Corpusmentioning
confidence: 99%
“…Statistical morphological disambiguation using small manually annotated training corpora looks as quite a simple task, when frequencies of grammatical features are generated during the training phase and the most likely sequence of morphological features is found in a new text by the help of various probability methods. Drawing on the experience of morphological annotation systems for other free word order languages (Dębowski, 2004;Hajič et al, 2001;Palanisamy et al, 2006 etc. ), it is obvious that the corpus-based method is most suitable for the developing such systems for Lithuanian.…”
Section: Automatic Morphological Annotation Of the Lithuanian Corpusmentioning
confidence: 99%
“…It happens so only for the necessity of the effective search for the most probable hidden states. Some well-known applications of HMMs are automatic speech recognizers (Jelinek, 1997) and trigram partof-speech taggers (Manning and Schütze, 1999;Dębowski, 2004b). It was observed that the error rate of trigram taggers decreases as a negative power of the size of the training data.…”
Section: B Some Properties Of Infinitary Distributionsmentioning
confidence: 99%
“…Several techniques of morphosyntactic tagging for Polish have been explored over the years, including trigrams [4], transformation-based methods 3 (TaKIPI [12]; Pantera [1]), conditional random fields (WCRFT [13]; Concraft [20]), and neural networks (Toygger [9]; KRNNT [22]; MorphoDiTa-pl [19]). The latter now obtain state-of-the-art results 4 in the task of morphosyntactic tagging for Polish [7]. All these taggers adopt a pipeline architecture, where morphosyntactic disambiguation (including guessing) is preceded by sentence segmentation, word segmentation, and morphosyntactic analysis (not necessarily in this order).…”
Section: Introduction and Related Workmentioning
confidence: 99%