Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval 2021
DOI: 10.1145/3404835.3463237
|View full text |Cite
|
Sign up to set email alerts
|

Morphologically Annotated Amharic Text Corpora

Abstract: In information retrieval (IR), documents that match the query are retrieved. Search engines usually conflate word variants into a common stem when indexing documents because queries and documents do not need to use exactly the same word variant for the documents to be relevant. Stemmers are known to be effective in many languages for IR. However, there are still languages where stemmers or morphological analyzers are missing; this is the case for Amharic which is the working language of Ethiopia. Morphological… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
11
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
1
1

Relationship

2
3

Authors

Journals

citations
Cited by 6 publications
(11 citation statements)
references
References 22 publications
0
11
0
Order By: Relevance
“…We can quote a few such studies. Demeke and Getachew [23] created Walta Information Center news corpus; Yeshambel et al [24] built 2AIRTC; and Yeshambel et al [10] created stem-based and root-based morphologically annotated Amharic corpora semiautomatically. The sizes of corpora created by Demeke and Getachew [23], Yeshambel et al [24] and Yeshambel et al [10] are 1,065, 12,586, and 6,069 documents, respectively.…”
Section: Evaluation Of Amharic Ir Corpora Resources and Nlp Toolsmentioning
confidence: 99%
See 4 more Smart Citations
“…We can quote a few such studies. Demeke and Getachew [23] created Walta Information Center news corpus; Yeshambel et al [24] built 2AIRTC; and Yeshambel et al [10] created stem-based and root-based morphologically annotated Amharic corpora semiautomatically. The sizes of corpora created by Demeke and Getachew [23], Yeshambel et al [24] and Yeshambel et al [10] are 1,065, 12,586, and 6,069 documents, respectively.…”
Section: Evaluation Of Amharic Ir Corpora Resources and Nlp Toolsmentioning
confidence: 99%
“…Demeke and Getachew [23] created Walta Information Center news corpus; Yeshambel et al [24] built 2AIRTC; and Yeshambel et al [10] created stem-based and root-based morphologically annotated Amharic corpora semiautomatically. The sizes of corpora created by Demeke and Getachew [23], Yeshambel et al [24] and Yeshambel et al [10] are 1,065, 12,586, and 6,069 documents, respectively. Mindaye et al [13] and Samuel and Bjorn [25] created Amharic word-based stopword list whereas Alemayehu and Willett [26] built stem-based stopwords list.…”
Section: Evaluation Of Amharic Ir Corpora Resources and Nlp Toolsmentioning
confidence: 99%
See 3 more Smart Citations