Arabic Text Classification Algorithm using TFIDF and Chi Square Measurements

Abu-Errub, Aymen

doi:10.5120/16223-5674

Cited by 25 publications

(17 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Term frequency-inverse document frequency is a weighting scheme used in text mining [7]. It is a statistical method which is used to give weightage to each word in a document.…”

Section: B Tf-idfmentioning

confidence: 99%

Feature Selection Methods for Mining Social Media

Mageshwari*,

Aroquiaraj

2019

IJITEE

View full text Add to dashboard Cite

People can share their thoughts and opinion through Social Media which can easily widespread. So many public issues and political views are also discussed on social media. HIV/AIDS is also one of the important topics discussed. This work aims to classify HIV/AIDS related twitter data. Since the twitter data is highly dimensional, it is essential to do reduce dimensionality of the data to attain better classification results. Tweets are collected using keyword search and necessary preprocessing steps are carried out. Then feature extraction methods such as Bag of Words (BOW) model and TF-IDF are implemented. Singular Value Decomposition (SVD) and Principal Component Analysis (PCA) techniques are used for dimensionality reduction. Finally, classification is carried out and the results are discussed.

show abstract

“…Term frequency-inverse document frequency is a weighting scheme used in text mining [7]. It is a statistical method which is used to give weightage to each word in a document.…”

Section: B Tf-idfmentioning

confidence: 99%

Feature Selection Methods for Mining Social Media

Mageshwari*,

Aroquiaraj

2019

IJITEE

View full text Add to dashboard Cite

show abstract

“…In other words, NB classifier assumes that the absence of class feature is unrelated to the absence of other features. NB is commonly used to classify documents due to that is given a good performance in classification, NB computes the probability of documents that related to classify them into different classes, and then assigns them to the specific class with the highest probability [19].…”

Section: Naïve Bayesian Algorithmmentioning

confidence: 99%

A Survey of Arabic Text Classification Models

Al-Sbou¹

2018

IJECE

View full text Add to dashboard Cite

<p>There is a huge content of Arabic text available over online that requires an organization of these texts. As result, here are many applications of natural languages processing (NLP) that concerns with text organization. One of the is text classification (TC). TC helps to make dealing with unorganized text. However, it is easier to classify them into suitable class or labels. This paper is a survey of Arabic text classification. Also, it presents comparison among different methods in the classification of Arabic texts, where Arabic text is represented a complex text due to its vocabularies. Arabic language is one of the richest languages in the world, where it has many linguistic bases. The researche in Arabic language processing is very few compared to English. As a result, these problems represent challenges in the classification, and organization of specific Arabic text. Text classification (TC) helps to access the most documents, or information that has already classified into specific classes, or categories to one or more classes or categories. In addition, classification of documents facilitate search engine to decrease the amount of document to, and then to become easier to search and matching with queries.</p>

show abstract

“…Information gain evaluates the number of bits of information obtained for class prediction by knowing the occurrence or nonoccurrence of a feature while chi-square interprets the lack of independence between feature and class and can be checked the distribution of chi-square with one degree of freedom to judge extremeness [21,[32][33][34][35]. Inspired from previous feature selection studies [21,[32][33][34][35][36][37][38][39], we intensify on information gain (IG) and chisquare (CHI) feature selection methods and figure out IG and CHI values of each features. In other words, we try to compose the set of the most significant features with high classification success for associating to the original feature space.…”

Section: Proposed Frameworkmentioning

confidence: 99%

The Impact of Enhanced Space Forests with Classifier Ensembles on Biomedical Dataset Classification

Kilimci¹

2018

ijisae

View full text Add to dashboard Cite

In this paper, we propose to advance the classification success of classifier ensembles by investigating the contribution of enhanced space forests. For this purpose, this study especially is focused on enhanced feature spaces by implementing the most popular feature selection techniques, namely information gain, and chi-square. After performing these methods on the original feature space, training phase is evaluated with all the original and the modified versions of most significant features, which are acquired by applying difference operator to the original features and the selected features with feature selection methods. That is, the new training dataset is constructed by combining the original features and the new ones. Then, the training is done with the well-known classification algorithm namely, decision tree using the enhanced feature space. Finally, three types of ensemble algorithms namely, bagging, random subspace, and random forest are carried out. A wide range of comparative experiments are conducted on publicly available and widely-used 36 datasets from the UCI machine learning repository to observe the impact of the enhanced space forests with classifier ensembles. Experiment results demonstrate that the proposed enhanced space forests perform better classification accuracy than the state of the art studies. Approximately, 1%-3% improvement of the classification success is an indicator that our proposed technique is efficient.

show abstract

Arabic Text Classification Algorithm using TFIDF and Chi Square Measurements

Cited by 25 publications

References 7 publications

Feature Selection Methods for Mining Social Media

Feature Selection Methods for Mining Social Media

A Survey of Arabic Text Classification Models

The Impact of Enhanced Space Forests with Classifier Ensembles on Biomedical Dataset Classification

Contact Info

Product

Resources

About