2009
DOI: 10.1007/s10791-008-9080-x
|View full text |Cite
|
Sign up to set email alerts
|

Classifying Amharic webnews

Abstract: We present work aimed at compiling an Amharic corpus from the Web and automatically categorizing the texts. Amharic is the second most spoken Semitic language in the World (after Arabic) and used for countrywide communication in Ethiopia. It is highly inflectional and quite dialectally diversified. We discuss the issues of compiling and annotating a corpus of Amharic news articles from the Web. This corpus was then used in three sets of text classification experiments. Working with a less-researched language h… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2009
2009
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(2 citation statements)
references
References 31 publications
0
2
0
Order By: Relevance
“…Extensive research studies, which considered the impact of using different indexing approaches (full-word, stem, and root), exist for English [23–25] and other languages [2629]. In the case of Arabic, there are research studies on classifying Arabic documents: some have been based only on the stem [30 –32]; others have been based only on the root form [33, 34].…”
Section: Introductionmentioning
confidence: 99%
“…Extensive research studies, which considered the impact of using different indexing approaches (full-word, stem, and root), exist for English [23–25] and other languages [2629]. In the case of Arabic, there are research studies on classifying Arabic documents: some have been based only on the stem [30 –32]; others have been based only on the root form [33, 34].…”
Section: Introductionmentioning
confidence: 99%
“…Ethiopians utilize Amharic, a Semitic language, to communicate across the country. It is the world's second most spoken Semitic language (after Arabic), and one of the top ve on the African continent (Asker et al, 2009). Despite its enormous number of speakers, little computational linguistic resources have been produced for it, and little has been done to make viable higher-level Internet or computer-based applications available to persons who only speak Amharic (Asker et al, 2009).…”
Section: Amharic Languagementioning
confidence: 99%