2020
DOI: 10.1007/978-3-030-58219-7_5
|View full text |Cite
|
Sign up to set email alerts
|

2AIRTC: The Amharic Adhoc Information Retrieval Test Collection

Abstract: Evaluation is highly important for designing, developing, and maintaining information retrieval (IR) systems. The IR community has developed shared tasks where evaluation framework, evaluation measures and test collections have been developed for different languages. Although Amharic is the official language of Ethiopia currently having an estimated population of over 110 million, it is one of the under-resourced languages and there is not yet Amharic adhoc IR test collection. In this paper, we promote the mon… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
9
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
2

Relationship

2
3

Authors

Journals

citations
Cited by 6 publications
(9 citation statements)
references
References 16 publications
0
9
0
Order By: Relevance
“…Many IR and NLP applications need stem or root extraction prior to other processes. We conducted a preliminary analysis on the usefulness of stem-based and root-based retrieval using the corpora we built, the 2AIRTC [16] and the Amharic stopword list [33] which are all available at https://www.irit.fr/AmharicResources/. We found that rootbased approach is better for retrieving more number of relevant documents.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Many IR and NLP applications need stem or root extraction prior to other processes. We conducted a preliminary analysis on the usefulness of stem-based and root-based retrieval using the corpora we built, the 2AIRTC [16] and the Amharic stopword list [33] which are all available at https://www.irit.fr/AmharicResources/. We found that rootbased approach is better for retrieving more number of relevant documents.…”
Section: Discussionmentioning
confidence: 99%
“…In this paper, we present a collection which consists in two lexicons of 170,000 morphologically annotated Amharic terms where both stems and roots are annotated, as well as corpora of texts where documents have been re-written using these lexicons. These texts are part of the 2AIRTC, the Amharic Adhoc Information Retrieval Test Collection where documents, queries and query relevance are provided [16].…”
Section: Introductionmentioning
confidence: 99%
“…We can quote a few such studies. Demeke and Getachew [23] created Walta Information Center news corpus; Yeshambel et al [24] built 2AIRTC; and Yeshambel et al [10] created stem-based and root-based morphologically annotated Amharic corpora semiautomatically. The sizes of corpora created by Demeke and Getachew [23], Yeshambel et al [24] and Yeshambel et al [10] are 1,065, 12,586, and 6,069 documents, respectively.…”
Section: Evaluation Of Amharic Ir Corpora Resources and Nlp Toolsmentioning
confidence: 99%
“…Demeke and Getachew [23] created Walta Information Center news corpus; Yeshambel et al [24] built 2AIRTC; and Yeshambel et al [10] created stem-based and root-based morphologically annotated Amharic corpora semiautomatically. The sizes of corpora created by Demeke and Getachew [23], Yeshambel et al [24] and Yeshambel et al [10] are 1,065, 12,586, and 6,069 documents, respectively. Mindaye et al [13] and Samuel and Bjorn [25] created Amharic word-based stopword list whereas Alemayehu and Willett [26] built stem-based stopwords list.…”
Section: Evaluation Of Amharic Ir Corpora Resources and Nlp Toolsmentioning
confidence: 99%
See 1 more Smart Citation