2008
DOI: 10.1007/s10791-008-9081-9
|View full text |Cite
|
Sign up to set email alerts
|

Towards enhancing retrieval effectiveness of search engines for diacritisized Arabic documents

Abstract: The majority of Arabic text available on the web is written without short vowels (diacritics). Diacritics are commonly used in religious scripts such as the holy Quran (the book of Islam), Al-Hadith (the teachings of Prophet Mohammad (PBUH)), children's literature, and in some words where ambiguity of articulation might arise. Internet Arabic users might lose credible sources of Arabic text to be retrieved if they could not match the correct diacritical marks attached to the words in the collection. However, t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
21
0

Year Published

2009
2009
2023
2023

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 33 publications
(21 citation statements)
references
References 35 publications
0
21
0
Order By: Relevance
“…In his study for the Holy Quran, Hammo [24] stated that most of the failing cases of Khoja when it was used to stem words of the Holy book, were occurred when stemming proper names such as the names of Prophets, angels, ancient cities, places and people, numerals, as well as words with the diacritical mark sha-dda.…”
Section: Root-based and Morphological Analyzersmentioning
confidence: 99%
“…In his study for the Holy Quran, Hammo [24] stated that most of the failing cases of Khoja when it was used to stem words of the Holy book, were occurred when stemming proper names such as the names of Prophets, angels, ancient cities, places and people, numerals, as well as words with the diacritical mark sha-dda.…”
Section: Root-based and Morphological Analyzersmentioning
confidence: 99%
“…Stemming is the process of correlating several terms onto one common representation in the base form [16]. It minimizes the index size because it has the advantage of reducing storage requirements by eliminating the redundant words.…”
Section: Introductionmentioning
confidence: 99%
“…Stemming uses morphological heuristics in order to remove affixes from words and the processing cost is relatively low. For those reasons, the stemming is important and highly attractive for many natural language processing (NLP) fields such as: information retrieval (IR), question answering (QA), information extraction (IE), machine translation (MT), text summarizations (TS), Text Classification (TC), Text Clustering (TClu), Text segmentation (TS), Indexing (Ind), and Automatic Speech Recognition (ASR) [16]. There are many developed algorithmic stemming and various morphological analysis approaches to achieve morphologically related forms combined under the same stem using stemmer [14].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…The growing number of Arabic documents on the Web signals the need for advanced and improved Web search engines that retrieve related Arabic documents with high correctness and less time based on user requests. Precision, percentage of the retrieved related-documents and recall are measures used to determine the IR system's effectiveness and correctness (Abdelmgeid, 2007;Hammo, 2009).…”
Section: Introductionmentioning
confidence: 99%