2014
DOI: 10.5120/16223-5674
|View full text |Cite
|
Sign up to set email alerts
|

Arabic Text Classification Algorithm using TFIDF and Chi Square Measurements

Abstract: Text categorization is the process of classifying documents into a predefined set of categories based on its contents of keywords. Text classification is an extended type of text categorization where the text is further categorized into subcategories. Many algorithms have been proposed and implemented to solve the problem of English text categorization and classification. However, few studies have been carried out for categorizing and classifying Arabic text. Compared to English, the Arabic text classification… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
17
0

Year Published

2014
2014
2022
2022

Publication Types

Select...
6
3

Relationship

1
8

Authors

Journals

citations
Cited by 25 publications
(17 citation statements)
references
References 7 publications
0
17
0
Order By: Relevance
“…Term frequency-inverse document frequency is a weighting scheme used in text mining [7]. It is a statistical method which is used to give weightage to each word in a document.…”
Section: B Tf-idfmentioning
confidence: 99%
“…Term frequency-inverse document frequency is a weighting scheme used in text mining [7]. It is a statistical method which is used to give weightage to each word in a document.…”
Section: B Tf-idfmentioning
confidence: 99%
“…In other words, NB classifier assumes that the absence of class feature is unrelated to the absence of other features. NB is commonly used to classify documents due to that is given a good performance in classification, NB computes the probability of documents that related to classify them into different classes, and then assigns them to the specific class with the highest probability [19].…”
Section: Naïve Bayesian Algorithmmentioning
confidence: 99%
“…Information gain evaluates the number of bits of information obtained for class prediction by knowing the occurrence or nonoccurrence of a feature while chi-square interprets the lack of independence between feature and class and can be checked the distribution of chi-square with one degree of freedom to judge extremeness [21,[32][33][34][35]. Inspired from previous feature selection studies [21,[32][33][34][35][36][37][38][39], we intensify on information gain (IG) and chisquare (CHI) feature selection methods and figure out IG and CHI values of each features. In other words, we try to compose the set of the most significant features with high classification success for associating to the original feature space.…”
Section: Proposed Frameworkmentioning
confidence: 99%