1997
DOI: 10.1002/(sici)1097-4571(199710)48:10<867::aid-asi3>3.0.co;2-#
|View full text |Cite
|
Sign up to set email alerts
|

Design and implementation of automatic indexing for information retrieval with Arabic documents

Abstract: We have put together a corpus of 242 abstracts of Arabic documents using the Proceedings of the Saudi Arabian National Conferences as a source. All these abstracts involve computer science and information systems. We also designed and built an automatic information retrieval system from scratch to handle Arabic data. The system was implemented in the C language using the GCC compiler and runs on IBM/PCs and compatible microcomputers. We have implemented both automatic and manual indexing techniques for this co… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
26
0

Year Published

1998
1998
2014
2014

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 36 publications
(26 citation statements)
references
References 6 publications
0
26
0
Order By: Relevance
“…Chen and Gey [13] proposed an approach to the cross language retrieval which was to translate the English topics into Arabic using online English-Arabic machine translation systems, and they reported on the construction of an Arabic stop list and two Arabic stemmers, and the experiments on Arabic monolingual retrieval, English-toArabic cross-language retrieval. Hmeidi , Kanaan and Evens [14] have put together a corpus and designed and built an automatic IR system from scratch to handle Arabic data. They have implemented both automatic and manual indexing techniques for this corpus.…”
Section: Related Workmentioning
confidence: 99%
“…Chen and Gey [13] proposed an approach to the cross language retrieval which was to translate the English topics into Arabic using online English-Arabic machine translation systems, and they reported on the construction of an Arabic stop list and two Arabic stemmers, and the experiments on Arabic monolingual retrieval, English-toArabic cross-language retrieval. Hmeidi , Kanaan and Evens [14] have put together a corpus and designed and built an automatic IR system from scratch to handle Arabic data. They have implemented both automatic and manual indexing techniques for this corpus.…”
Section: Related Workmentioning
confidence: 99%
“…Some of the Arabic IR systems that use morphology include Swift [1] and electronic publishing software developed by Sakhr that contain IR components (such as the Encyclopedia of Jurisprudence) [2]. Arabic IR studies have shown that the use of Arabic roots as indexing terms substantially improves the retrieval effectiveness over the use of words as index terms [3] [4] [5].…”
Section: Introductionmentioning
confidence: 99%
“…However, this paper is concerned with morphological analysis for the purpose of IR. Arabic IR is enhanced when the roots are used in indexing and searching [3] [4] [5].…”
Section: Introductionmentioning
confidence: 99%
“…Recall that ALPNET produces analysis in random order. As indicated earlier, some early work with small test collections (Al-Kharashi & Evens, 1994;Hmeidi et al, 1997) suggested that roots were a better choice than stems, but the experiments presented here found just the opposite. One possible explanation for this is that earlier test collections contained at most a few hundred documents, and scaling up the size of the collection by several orders of magnitude might reward the choice of less ambiguous terms.…”
Section: Evaluating Sebawai and Al-stem In Irmentioning
confidence: 40%
“…However, often irregular roots, which contain double or weak letters, lead to stems and words that have letters from the root that are deleted or replaced. For Arabic IR, several early studies suggested that indexing Arabic text using roots significantly increases retrieval effectiveness over the use of words or stems (Abu-Salem et al, 1999;Al-Kharashi & Evens, 1994;Hmeidi et al, 1997). However, the studies used small test collections of only hundreds of documents and the morphology in many of the studies was done manually.…”
Section: Introductionmentioning
confidence: 99%