2016
DOI: 10.5120/ijca2016908328
|View full text |Cite
|
Sign up to set email alerts
|

Improving Arabic Text Categorization using Normalization and Stemming Techniques

Abstract: Text Categorization is a technique for assigning documents based on their contents to one or more pre-defined categories. Achieving highest categorization accuracy remains one of the major challenges and it is also time consuming. We proposed approach to tackle these challenges. The proposed approach uses Frequency Ratio Accumulation Method (FRAM) as a classifier. Its features are represented using bag of word technique and an improved Term Frequency (TF) technique is used in features selection. The proposed a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
6
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 15 publications
(7 citation statements)
references
References 16 publications
1
6
0
Order By: Relevance
“…On PS and CV test mode, J48 had the most significant accuracy decrease from 76.3% to 64.2% in PS mode, and 69.69% to 62.6% in CV mode. Similar research conclusion was also obtained by [21] in their research. They performed Arabic text classification using three datasets that were taken from two trusted sources.…”
Section: Related Worksupporting
confidence: 89%
“…On PS and CV test mode, J48 had the most significant accuracy decrease from 76.3% to 64.2% in PS mode, and 69.69% to 62.6% in CV mode. Similar research conclusion was also obtained by [21] in their research. They performed Arabic text classification using three datasets that were taken from two trusted sources.…”
Section: Related Worksupporting
confidence: 89%
“…The PIR is computed as formulated the Eq. (35). In case of RW-CB-Twitter, the PIRs of the STTM models LDA, BTM, PTM, GLTM, FTM, and WNTM are 1.82%, 1.96%, 2.36%, 2.51%, 3.05%, and 2.73%, respectively.…”
Section: ) Topic Coherence Evaluation Results With Smdcm and Baseline...mentioning
confidence: 96%
“…Duwairi and El-Orfali [31] studied the impact of various pre-processing techniques like n-gram models, feature correlation on Arabic text sentiment analysis. Some other works studied the impacts of stemming on the Arabic text classification performance, such as [32], [33], [34], and [35].…”
Section: ) Preprocessingmentioning
confidence: 99%
See 1 more Smart Citation
“…SANAD comprises of the above three datasets, which makes it the largest, to our knowledge, available and representative corpus. In contrast with other few available datasets such as those used in [2], [3], [4], SANAD is large enough to enable researchers to implement classical and deep learning models for text classification as it is the case in [1], which used for sentiment classification. Few similar datasets already exist but are not comparable in size and have less tags.…”
Section: Experimental Design Materials and Methodsmentioning
confidence: 99%