2021
DOI: 10.30534/ijatcse/2021/061012021
|View full text |Cite
|
Sign up to set email alerts
|

The effects of Pre-Processing Techniques on Arabic Text Classification

Abstract: In the last two decades, the amount of available Arabic text data on the World Wide Web is dramatically growing, making it the fourth most used language on the web. Accordingly, the demand for efficient Arabic text classification is increasing, especially for web page content filtering, information retrieval, and e-mail spam detection. Several Machine Learning algorithms have been implemented to classify Arabic documents. However, the results achieved are not comparable with those obtained in other languages s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 21 publications
(9 citation statements)
references
References 32 publications
0
9
0
Order By: Relevance
“…These stemmers are ARLSTem v1.0 [16], Tashaphyne, integrated system of rice intensification (ISRI) stemmer [17], and the stemmer included in Madamira [18]. c) Lemmatization: Lemmatization has recently proved to be beneficial for Arabic text classifiers [19]- [21].…”
Section: Text Processing Techniquesmentioning
confidence: 99%
See 1 more Smart Citation
“…These stemmers are ARLSTem v1.0 [16], Tashaphyne, integrated system of rice intensification (ISRI) stemmer [17], and the stemmer included in Madamira [18]. c) Lemmatization: Lemmatization has recently proved to be beneficial for Arabic text classifiers [19]- [21].…”
Section: Text Processing Techniquesmentioning
confidence: 99%
“…Selecting the appropriate linguistic feature that can represent the original text is still intriguing researchers working on Arabic text classification [19], [20]. Therefore, we used 10 different linguistic features to represent the tweets' text.…”
Section: Effect Of Text Processing Techniquesmentioning
confidence: 99%
“…Removing stop words reduces the storage spaces required to store identified tokens. Not to mention, many studies showed that removing Stop-words improve the efficiency and effectiveness of Arabic IR systems [5], [11].…”
Section: Stoppingmentioning
confidence: 99%
“…Lemmatization takes a more complex approach in text processing; It aims to regroup semantically related words, and it is proved to be beneficial in the areas of Arabic information retrieval [11], [14]. However, in Arabic, the use of lemmatization is more difficult task due to the morphological complexity of the language itself, and the absence of short vowels in most existing Arabic documents [15].…”
Section: Lemmatizationmentioning
confidence: 99%
“…Documents in the bag of words containing no words were removed as well as their entry labels. A support vector machines (SVM)-based supervised classification model was used using the word frequency counts from the bag-of-words model and the labels [21], [22]. A multiclass linear classifier specifies the counts of the bag-of-words model to be the predictor, and the event type labels to be the response.…”
Section: Thematic Classification and Machine Learningmentioning
confidence: 99%