2018
DOI: 10.25079/ukhjse.v2n1y2018.pp48-54
|View full text |Cite
|
Sign up to set email alerts
|

Improving Kurdish Web Mining through Tree Data Structure and Porter’s Stemmer Algorithms

Abstract: A B S T R A C TStemming is one of the main important preprocessing techniques that can be used to enhance the accuracy of text classification. The key purpose of using the stemming is combining the number of words that have the same stem to decrease high dimensionality of feature space. Reducing feature space causes to decline time to construct a model and minimize the memory space. In this paper, a new stemming approach is explored for enhancing Kurdish text classification performance. Tree data structure and… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3

Citation Types

0
3
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5

Relationship

1
4

Authors

Journals

citations
Cited by 7 publications
(3 citation statements)
references
References 12 publications
0
3
0
Order By: Relevance
“…The characters among those languages are almost like each other but sometimes have different Unicode as shown in Table 1 . The number of Sorani Kurdish alphabets is 36 that is divided into vowels (ا, ه‌, و, ۆ, وو, ى, ێ) and consonants (ئ, ب, پ, ت, ج, چ, ح, خ, د, ر, ڕ, ز, ژ, س, ش, ع, غ, ف, ڤ, ق, ک گ, ل, ڵ, م, ن, ه, (و, ى)) [3 , 4] . The character (و, ى) are used as vowels and constants based on the positions of the word, for example, the word (گوڵ) (gull) means (Flower), the (و) is a vowel, while the word (وازى) (wazi) means (game), the (و) is constant, the word (يارى) (yari) means (play), and the first (ى) is constant, by contrast, the second one is a vowel.…”
Section: Data Descriptionmentioning
confidence: 99%
See 2 more Smart Citations
“…The characters among those languages are almost like each other but sometimes have different Unicode as shown in Table 1 . The number of Sorani Kurdish alphabets is 36 that is divided into vowels (ا, ه‌, و, ۆ, وو, ى, ێ) and consonants (ئ, ب, پ, ت, ج, چ, ح, خ, د, ر, ڕ, ز, ژ, س, ش, ع, غ, ف, ڤ, ق, ک گ, ل, ڵ, م, ن, ه, (و, ى)) [3 , 4] . The character (و, ى) are used as vowels and constants based on the positions of the word, for example, the word (گوڵ) (gull) means (Flower), the (و) is a vowel, while the word (وازى) (wazi) means (game), the (و) is constant, the word (يارى) (yari) means (play), and the first (ى) is constant, by contrast, the second one is a vowel.…”
Section: Data Descriptionmentioning
confidence: 99%
“…The character (و, ى) are used as vowels and constants based on the positions of the word, for example, the word (گوڵ) (gull) means (Flower), the (و) is a vowel, while the word (وازى) (wazi) means (game), the (و) is constant, the word (يارى) (yari) means (play), and the first (ى) is constant, by contrast, the second one is a vowel. The Kurdish language is complex and has different scripts (no standard) for Sorani, dialect for example, in some sources (ك) is used instead of (ک) [3 , 4] .…”
Section: Data Descriptionmentioning
confidence: 99%
See 1 more Smart Citation