2017
DOI: 10.1177/0165551516683617
|View full text |Cite
|
Sign up to set email alerts
|

Kurdish stemmer pre-processing steps for improving information retrieval

Abstract: The rapid increase in the quantity of Kurdish documents over the last several years has created a need for improving information accuracy and precision in text classification and retrieval. Language stemming is an imperative pre-processing step for increasing the possibility of matching terms in a document in text classification tasks. Stemming helps reduce the total number of searchable terms within a document or query. This article proposes an active approach for stemming Kurdish Sorani texts to reduce varia… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
8
1

Relationship

0
9

Authors

Journals

citations
Cited by 20 publications
(10 citation statements)
references
References 17 publications
0
10
0
Order By: Relevance
“…The Sanad of hadith has been ignored, and we focused on the preprocessing of text. The first step of text classification is to convert the text into clear words format, then into a vector [26], [28], After that, identifying the most common words and the most informative features in the dataset of hadith.…”
Section: B Hadith Text Pre-processingmentioning
confidence: 99%
“…The Sanad of hadith has been ignored, and we focused on the preprocessing of text. The first step of text classification is to convert the text into clear words format, then into a vector [26], [28], After that, identifying the most common words and the most informative features in the dataset of hadith.…”
Section: B Hadith Text Pre-processingmentioning
confidence: 99%
“…Significant cost reductions were also made by the system throughout the documentation and final approval of the reports in the imaging department. Kurdish stemmer pre-processing for improving information retrieval conducted by researcher in [13]. This article introduces the Kurdish stemming-step method.…”
Section: Related Workmentioning
confidence: 99%
“…Several studies have been done related to common languages such English [5], [6], Arabic [7]- [9], and Persian [10]- [12]. Moreover, there are few studies which are consummated regarding Kurdish language [13], [14], despite it, a huge gap can be seen in the case of Kurdish Kurmanji dialect; therefore, this study has been aimed to serve this gap due to Kurmanji dialect in the case of creating lemmatization and spell-checker with spell-correction system. Hence, in the future, this study can be used in several applications that include data translation, sentence retrieval, document retrieval, and also can be extend and upgrade to more powerful similar systems.…”
Section: Introductionmentioning
confidence: 99%
“…Central Kurdish ( Sorani ) is one of two main dialects of the Kurdish language, it is generally thought that Sorani is spoken by about 9 to 10 million people in Iraq and Iran [ 1 , 2 ]. It is mainly written using a modified Arabic/Persian alphabet containing 34 characters, including characters that have been replaced in recent years like (ك) that's no longer been used by the Kurdish language and replaced with (ک).…”
Section: Data Descriptionmentioning
confidence: 99%