Ari M. Saeed scite author profile

Stemming is one of the most significant preprocessing. stages in text categorization that most of the academic investigators aim to improve and optimize the accuracy of the classification task. High dimensionality of feature space is one of the challenges in text classification that can be decreased by many techniques. In stemming, high dimensionality of feature space is decreased by grouping those words that they have same grammatical forms and then getting their root. This work is dedicated to build an approach for Kurdish language classification using Reber Stemmer. Thus, an innovative approach is investigated to get the stem of words in Kurdish language by removing longest suffix and prefixes of words. This approach has a strong capability and meets the requirements in responding to the process of deleting as many of the required affixes as possible to get the stem of words in Kurdish language. The advantage of this stemmer is that it ignores the ordering list of affixes that receives correct stem for more than one words that have the same format. The stemming technique is implemented on KDC-4007 dataset that consists of eight classes. Support Vector Machine (SVM) and Decision Tree (DT or C 4.5) are used for the classification. This stemmer has been successfully compared with the Longest-Match stemmer technique. According to results, the F-measure of Reber stemmer and Longest-Match method in SVM is higher than DT. Reber stemmer in SVM for classes (religion, sport, health and education) obtained higher F-measure, while the rest of classes are lower in Longest-Match. Reber stemmer in DT for classes (religion, sport and art) had higher F-measure for Reber stemmer while in Longest match the rest of classes showed lower F-measure.

show abstract

Medical dataset classification for Kurdish short text over social media

Saeed

Hussein

Ali

et al. 2022

Data in Brief

View full text Add to dashboard Cite

Hate Speech Detection in Social Media for the Kurdish Language

Saeed

Ismael

Rasul

et al. 2022

View full text Add to dashboard Cite

Improving Kurdish Web Mining through Tree Data Structure and Porter’s Stemmer Algorithms

et al. 2018

View full text Add to dashboard Cite

A B S T R A C TStemming is one of the main important preprocessing techniques that can be used to enhance the accuracy of text classification. The key purpose of using the stemming is combining the number of words that have the same stem to decrease high dimensionality of feature space. Reducing feature space causes to decline time to construct a model and minimize the memory space. In this paper, a new stemming approach is explored for enhancing Kurdish text classification performance. Tree data structure and Porter's stemmer algorithms are incorporated for building the proposed approach. The system is assessed through using support vector machine and decision tree (C4.5) to illustrate the performance of the suggested stemmer after and before applying it. Furthermore, the usefulness of using stop words is considered before and after implementing the suggested approach.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Ari M. Saeed

Automatic Kurdish Text Classification Using KDC 4007 Dataset

An evaluation of Reber stemmer with longest match stemmer technique in Kurdish Sorani text classification

Medical dataset classification for Kurdish short text over social media

Hate Speech Detection in Social Media for the Kurdish Language

Improving Kurdish Web Mining through Tree Data Structure and Porter’s Stemmer Algorithms

Contact Info

Product

Resources

About