Proceedings of the ACL Workshop on Computational Approaches to Semitic Languages - Semitic '05 2005
DOI: 10.3115/1621787.1621797
|View full text |Cite
|
Sign up to set email alerts
|

Part of speech tagging for Amharic using conditional random fields

Abstract: We applied Conditional Random Fields (CRFs) to the tasks of Amharic word segmentation and POS tagging using a small annotated corpus of 1000 words. Given the size of the data and the large number of unknown words in the test corpus (80%), an accuracy of 84% for Amharic word segmentation and 74% for POS tagging is encouraging, indicating the applicability of CRFs for a morphologically complex language like Amharic.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
16
0

Year Published

2009
2009
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 14 publications
(16 citation statements)
references
References 4 publications
0
16
0
Order By: Relevance
“…The Faculty of Informatics at Addis Ababa University (Ethiopia) has become very active in the field as well, working on, amongst others, Amharic morphological analysis, part-of-speech tagging, machine translation and document classification (Alemayehu and Willett 2002;Weldesellassie 2003;Amsalu and Gibbon 2005;Adafre 2005;Argaw and Asker 2007;Anberbir and Takara 2009;Tachbelie and Menzel 2010). An overview of Amharic language technology research efforts can be found at http://nlp.amharic.org.…”
Section: A Brief Overview Of African Language Technologymentioning
confidence: 98%
“…The Faculty of Informatics at Addis Ababa University (Ethiopia) has become very active in the field as well, working on, amongst others, Amharic morphological analysis, part-of-speech tagging, machine translation and document classification (Alemayehu and Willett 2002;Weldesellassie 2003;Amsalu and Gibbon 2005;Adafre 2005;Argaw and Asker 2007;Anberbir and Takara 2009;Tachbelie and Menzel 2010). An overview of Amharic language technology research efforts can be found at http://nlp.amharic.org.…”
Section: A Brief Overview Of African Language Technologymentioning
confidence: 98%
“…Using small training set of size (1000 words) Adafre et al [27]develop CRF POS tagger for Amharic language .The set of features that are used for training are composed of lexical features, morphological features, dictionary features, the previous two POS tags, and character bi-grams. The accuracy of the tagger is 74% because of the small training data set.…”
Section: E Conditional Random Filedmentioning
confidence: 99%
“…Manual annotation: this corpus is annotated morphologically using AncoraPipe annotation tool. 1 Four annotators were involved in this task and annotation speed was between 80 and 120 tokens/hour. Our Inter Annotator Agreement is around 94.98%.…”
Section: Manual Annotation and Amts Tag Setmentioning
confidence: 99%
“…CRFs are applied to many NLP fields such as name entity recognition [25], shallow parsing [37], information extraction from tables [33]. CRFs were used for POS-tagging in many languages, such as Amharic [1] and Tamil [24].…”
Section: Our Machine Learnersmentioning
confidence: 99%
See 1 more Smart Citation