2008
DOI: 10.1017/s0332586508001820
|View full text |Cite
|
Sign up to set email alerts
|

Tagging Icelandic text: A linguistic rule-based approach

Abstract: The Icelandic language is a morphologically complex language, for which a large tagset has been created. This paper describes the design of a linguistic rule-based system for part-of-speech tagging Icelandic text. The system contains two main components: a disambiguator, IceTagger, and an unknown word guesser, IceMorphy. IceTagger uses a small number of local elimination rules along with a global heuristics component. The heuristics guess the functional roles of the words in a sentence, mark prepositional phra… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
19
0

Year Published

2008
2008
2019
2019

Publication Types

Select...
4
2
1

Relationship

2
5

Authors

Journals

citations
Cited by 26 publications
(19 citation statements)
references
References 19 publications
0
19
0
Order By: Relevance
“…A comparison of our model to other previously published taggers for Icelandic is shown in Table 2. The results for TnT, IceTagger and Ice-Stagger are presented in (Loftsson et al, 2009;Loftsson, 2008;Loftsson and Östling, 2013), respectively. All the reported results are fully comparable as they are based on exactly the same cross-validation split of the IFD corpus, with the exception that the TnT tagger does not employ data from DMII, and has therefore a higher ratio of unknown words.…”
Section: Comparison To Other Taggersmentioning
confidence: 99%
“…A comparison of our model to other previously published taggers for Icelandic is shown in Table 2. The results for TnT, IceTagger and Ice-Stagger are presented in (Loftsson et al, 2009;Loftsson, 2008;Loftsson and Östling, 2013), respectively. All the reported results are fully comparable as they are based on exactly the same cross-validation split of the IFD corpus, with the exception that the TnT tagger does not employ data from DMII, and has therefore a higher ratio of unknown words.…”
Section: Comparison To Other Taggersmentioning
confidence: 99%
“…The disambiguation process will be tested using the latest methods of the current research on tagging algorithms, i.e. rule-based [7,9,14,15], stochastic [3,4,5,10,11] and transformation-based [1] algorithms. In order to achieve the best results regarding the morphosyntactic properties of Modern Greek, the most suitable tagger will be selected.…”
Section: Related Workmentioning
confidence: 99%
“…For each word class there is a predefined number of additional characters (at most six) which describe morphological features, like gender, number and case for nouns; degree and declension for adjectives; voice, mood and tense for verbs, etc. The reader is referred to (Loftsson, 2006a;Pind et al, 1991) for a more complete description of the tagset.…”
Section: The Icelandic Language the Tagset And The Corpusmentioning
confidence: 99%
“…In this section, we describe four integration methods, all of which have resulted in an improved tagging accuracy of Icelandic text. The first two methods, which consist of integrating our morphological analyser with state-of-the-art DDT, are described in more detail in (Loftsson, 2006a). The latter two methods are new.…”
Section: Integration Of Taggersmentioning
confidence: 99%
See 1 more Smart Citation