2018
DOI: 10.5120/ijca2018916800
|View full text |Cite
|
Sign up to set email alerts
|

A Comparative Study on using Principle Component Analysis with different Text Classifiers

Abstract: Text categorization (TC) is the task of automatically organizing a set of documents into a set of pre-defined categories. Over the last few years, increased attention has been paid to the use of documents in digital form and this makes text categorization becomes a challenging issue. The most significant problem of text categorization is its huge number of features. Most of these features are redundant, noisy and irrelevant that cause over fitting with most of the classifiers. Hence, feature extraction is an i… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
15
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 11 publications
(15 citation statements)
references
References 18 publications
(19 reference statements)
0
15
0
Order By: Relevance
“…Furthermore, calculating the TF-IDF weight of a term in a particular document requires calculating term frequency [TF(t, d)], which is the number of times that the word t occurred in document d; document frequency [DF(t)], which is the number of documents in which term t occurs at least once; and inverse document frequency (IDF), which can be calculated from DF using the following formula. The IDF of a word is considered high if it occurred in a few documents and low if it occurred in many documents (Ahmed Taloba et al, 2018). The TF-IDF model is defined in Equations ( 2) and ( 3):…”
Section: Data Preprocessingmentioning
confidence: 99%
“…Furthermore, calculating the TF-IDF weight of a term in a particular document requires calculating term frequency [TF(t, d)], which is the number of times that the word t occurred in document d; document frequency [DF(t)], which is the number of documents in which term t occurs at least once; and inverse document frequency (IDF), which can be calculated from DF using the following formula. The IDF of a word is considered high if it occurred in a few documents and low if it occurred in many documents (Ahmed Taloba et al, 2018). The TF-IDF model is defined in Equations ( 2) and ( 3):…”
Section: Data Preprocessingmentioning
confidence: 99%
“…On the other hand, in feature extraction methods, a new vector space with special characteristics is created by transforming the original vector space. The features are reduced in new vector space [32].…”
Section: B Dimensionality Reduction Methodsmentioning
confidence: 99%
“…One of the most popular statistical techniques for feature selection is PCA [32]. The discriminative power of the classifiers can be enhanced by utilizing PCA.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…We decided to use the kNN classification approach as data was recorded from a single CI channel over a one second period resulting in low numbers of features (i.e. frequency points per band), a classification problem usually solved better by a kNN approach (Eisa et al, 2018). At first, a subject's data was standardized to unit variance and zero mean.…”
Section: Decoding Analysismentioning
confidence: 99%