An evaluation of Reber stemmer with longest match stemmer technique in Kurdish Sorani text classification

Saeed, Ari M.; Rashid, Tarik A.; Mustafa, Arazo M.; Agha, Rawan A. Al-Rashid; Shamsaldin, Ahmed S.; Al-Salihi, Nawzad K.

doi:10.1007/s42044-018-0007-4

Cited by 22 publications

(13 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The Kurdish language is one of the languages of the Middle East that is used for speaking by Kurdish people. Central Kurdish (Sorani) and Kurmanji are two popular dialects of the Kurdish language [1 , 2] . In this project, the Sorani dialect is used for collecting the database.…”

Section: Data Descriptionmentioning

confidence: 99%

Medical dataset classification for Kurdish short text over social media

et al. 2022

Self Cite

View full text Add to dashboard Cite

Section: Data Descriptionmentioning

confidence: 99%

Medical dataset classification for Kurdish short text over social media

et al. 2022

Self Cite

View full text Add to dashboard Cite

“…Researchers have developed accurate supervised text categorization, but for the Kurdish language, text categorization is extremely difficult; challenges include complicated morphologies of the Kurdish language. The reason beyond this difficulty is the enormous utilization of inflectional and derivational affixes (Rashid et al, 2018;Saeed et al, 2018;Rashid et al, 2016). In addition, there are challenges associated with writing in the Kurdish Sorani dialect, which commonly uses suffixes and possessive pronouns (Rashid et al, 2018;Saeed et al, 2018;Rashid et al, 2016).…”

Section: Literature Reviewmentioning

confidence: 99%

“…The reason beyond this difficulty is the enormous utilization of inflectional and derivational affixes (Rashid et al, 2018;Saeed et al, 2018;Rashid et al, 2016). In addition, there are challenges associated with writing in the Kurdish Sorani dialect, which commonly uses suffixes and possessive pronouns (Rashid et al, 2018;Saeed et al, 2018;Rashid et al, 2016). Building lemmatizers and spell checkers for Kurdish Sorani is possible because of the expansion of digital text.…”

Section: Literature Reviewmentioning

confidence: 99%

Next word prediction based on the N-gram model for Kurdish Sorani and Kurmanji

Hamarashid

Saeed

Rashid

2020

Neural Comput & Applic

Self Cite

View full text Add to dashboard Cite

Next word prediction is an input technology that simplifies the process of typing by suggesting the next word to a user to select, as typing in a conversation consumes time. A few previous studies have focused on the Kurdish language, including the use of next word prediction. However, the lack of a Kurdish text corpus presents a challenge. Moreover, the lack of a sufficient number of N-grams for the Kurdish language, for instance, five grams, is the reason for the rare use of next Kurdish word prediction. Furthermore, the improper display of several Kurdish letters in the Rstudio software is another problem. This paper provides a Kurdish corpus, creates five, and presents a unique research work on next word prediction for Kurdish Sorani and Kurmanji. The N-gram model has been used for next word prediction to reduce the amount of time while typing in the Kurdish language. In addition, little work has been conducted on next Kurdish word prediction; thus, the N-gram model is utilized to suggest text accurately. To do so, R programming and RStudio are used to build the application. The model is 96.3% accurate.

show abstract

“…In this research work, KDC-4007 dataset was used to evaluate the proposed approach. SVM and decision tree (C4.5) Rashid et al, 2017;Saeed et al, 2018) are applied as two common machine learning algorithms for comparing the results. The dataset is partitioned for two inequality portions.…”

Section: Classificationmentioning

confidence: 99%

Improving Kurdish Web Mining through Tree Data Structure and Porter’s Stemmer Algorithms

et al. 2018

Self Cite

View full text Add to dashboard Cite

A B S T R A C TStemming is one of the main important preprocessing techniques that can be used to enhance the accuracy of text classification. The key purpose of using the stemming is combining the number of words that have the same stem to decrease high dimensionality of feature space. Reducing feature space causes to decline time to construct a model and minimize the memory space. In this paper, a new stemming approach is explored for enhancing Kurdish text classification performance. Tree data structure and Porter's stemmer algorithms are incorporated for building the proposed approach. The system is assessed through using support vector machine and decision tree (C4.5) to illustrate the performance of the suggested stemmer after and before applying it. Furthermore, the usefulness of using stop words is considered before and after implementing the suggested approach.

show abstract

An evaluation of Reber stemmer with longest match stemmer technique in Kurdish Sorani text classification

Cited by 22 publications

References 14 publications

Medical dataset classification for Kurdish short text over social media

Medical dataset classification for Kurdish short text over social media

Next word prediction based on the N-gram model for Kurdish Sorani and Kurmanji

Improving Kurdish Web Mining through Tree Data Structure and Porter’s Stemmer Algorithms

Contact Info

Product

Resources

About