Proceedings of the 2003 EACL Workshop on Morphological Processing of Slavic Languages - MorphSlav '03 2003
DOI: 10.3115/1613200.1613205
|View full text |Cite
|
Sign up to set email alerts
|

A flexemic tagset for Polish

Abstract: The article notes certain weaknesses of current efforts aiming at the standardization of POS tagsets for morphologically rich languages and argues that, in order to achieve clear mappings between tagsets, it is necessary to have clear and formal rules of delimiting POSs and grammatical categories within any given tagset. An attempt at constructing such a tagset for Polish is presented.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
14
0

Year Published

2004
2004
2011
2011

Publication Types

Select...
7
2

Relationship

1
8

Authors

Journals

citations
Cited by 30 publications
(14 citation statements)
references
References 1 publication
0
14
0
Order By: Relevance
“…Many Polish wordforms historically derived as gerunds or participles, such as "oświece-nie", possess two meanings: one closer to a noun/adjective ("oświecenie" = enlightenment) and one closer to a verb ("oświecenie" = illumination). In the adopted annotation scheme [6], such a distinction exists for any Polish gerund or participle even if one can hardly figure out the noun-like meaning. Both human anotators and the tagger get confused, which is reflected in the high error rate on the attributes of aspect and negation (definite for verbal and indefinite for non-verbal forms).…”
Section: Preliminary Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…Many Polish wordforms historically derived as gerunds or participles, such as "oświece-nie", possess two meanings: one closer to a noun/adjective ("oświecenie" = enlightenment) and one closer to a verb ("oświecenie" = illumination). In the adopted annotation scheme [6], such a distinction exists for any Polish gerund or participle even if one can hardly figure out the noun-like meaning. Both human anotators and the tagger get confused, which is reflected in the high error rate on the attributes of aspect and negation (definite for verbal and indefinite for non-verbal forms).…”
Section: Preliminary Resultsmentioning
confidence: 99%
“…Contextually valid tags T 1:n = (T 1 , T 2 , ..., T n ) can be determined by humans for sufficiently long W 1;n = (W 1 , W 2 , ..., W n ) (almost) uniquely but the dependence between these two strings is very complex. The automatic computation of T 1 , T 2 , ..., T n can be made efficient if some error rate is allowed (even humans disagree for about 3% of tokens in the annotation scheme proposed in [6]). One of heuristic approaches which appears unexpectedly fruitful is trigram model and its modifications.…”
Section: The Modelmentioning
confidence: 99%
“…The notations for cases are as in the IPI PAN Corpus tagset: nom(inative), gen(itive), dat(ive), acc(usative), inst(rumental), loc(ative), and voc(ative), cf. Przepiórkowski and Woliński (2003) or http://korpus.pl/. For simplicity, it is assumed that no argument type can be repeated in a single valence frame.…”
Section: The Formalism Of Co-occurrence Matricesmentioning
confidence: 99%
“…The morphosyntactic specifications are based on the flexemic tagset for Polish (Przepiórkowski and Woliński 2003), used e.g. for the annotation of the IPI PAN corpus of Polish (Przepiórkowski 2006), and this corpus was also taken as the source for constructing the MULTEXT-East lexicon.…”
Section: Multext-east By Languagementioning
confidence: 99%