2018
DOI: 10.1007/s42044-018-0007-4
|View full text |Cite
|
Sign up to set email alerts
|

An evaluation of Reber stemmer with longest match stemmer technique in Kurdish Sorani text classification

Abstract: Stemming is one of the most significant preprocessing. stages in text categorization that most of the academic investigators aim to improve and optimize the accuracy of the classification task. High dimensionality of feature space is one of the challenges in text classification that can be decreased by many techniques. In stemming, high dimensionality of feature space is decreased by grouping those words that they have same grammatical forms and then getting their root. This work is dedicated to build an appro… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
13
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
6
1

Relationship

3
4

Authors

Journals

citations
Cited by 22 publications
(13 citation statements)
references
References 14 publications
0
13
0
Order By: Relevance
“…The Kurdish language is one of the languages of the Middle East that is used for speaking by Kurdish people. Central Kurdish (Sorani) and Kurmanji are two popular dialects of the Kurdish language [1 , 2] . In this project, the Sorani dialect is used for collecting the database.…”
Section: Data Descriptionmentioning
confidence: 99%
“…The Kurdish language is one of the languages of the Middle East that is used for speaking by Kurdish people. Central Kurdish (Sorani) and Kurmanji are two popular dialects of the Kurdish language [1 , 2] . In this project, the Sorani dialect is used for collecting the database.…”
Section: Data Descriptionmentioning
confidence: 99%
“…Researchers have developed accurate supervised text categorization, but for the Kurdish language, text categorization is extremely difficult; challenges include complicated morphologies of the Kurdish language. The reason beyond this difficulty is the enormous utilization of inflectional and derivational affixes (Rashid et al, 2018;Saeed et al, 2018;Rashid et al, 2016). In addition, there are challenges associated with writing in the Kurdish Sorani dialect, which commonly uses suffixes and possessive pronouns (Rashid et al, 2018;Saeed et al, 2018;Rashid et al, 2016).…”
Section: Literature Reviewmentioning
confidence: 99%
“…The reason beyond this difficulty is the enormous utilization of inflectional and derivational affixes (Rashid et al, 2018;Saeed et al, 2018;Rashid et al, 2016). In addition, there are challenges associated with writing in the Kurdish Sorani dialect, which commonly uses suffixes and possessive pronouns (Rashid et al, 2018;Saeed et al, 2018;Rashid et al, 2016). Building lemmatizers and spell checkers for Kurdish Sorani is possible because of the expansion of digital text.…”
Section: Literature Reviewmentioning
confidence: 99%
“…In this research work, KDC-4007 dataset was used to evaluate the proposed approach. SVM and decision tree (C4.5) Rashid et al, 2017;Saeed et al, 2018) are applied as two common machine learning algorithms for comparing the results. The dataset is partitioned for two inequality portions.…”
Section: Classificationmentioning
confidence: 99%