Discrimination-Based Feature Selection for Multinomial Naïve Bayes Text Classification

Zhu, Jun; Wang, Huizhen; Xi-Juan, Zhang

doi:10.1007/11940098_15

Cited by 6 publications

(6 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Metin işleme uygulamalarında boyut indirgeme için genellikle öznitelik seçim yöntemleri tercih edilmektedir. En bilinen ve en çok kullanılan öznitelik seçimi algoritmaları arasında belge frekansı (document frequency), chi istatistiği (chi statistic), bilgi kazanımı (information gain), terim dayanıklılığı (term strength) ve karşılıklı bilgi (mutual information) yer alır [4]. Sayılan yöntemlerin hepsi filtre yöntemlerdir ve öznitelikleri birbiriyle benzeşen entropi temelli değer hesabına göre süzerler.…”

Section: öZnitelik Seçim Yöntemleriunclassified

Comparing the impacts of dimension reduction methods that use class labels on text classification

Biricik

2012

2012 20th Signal Processing and Communications Applications Conference (SIU)

View full text Add to dashboard Cite

ÖZETÇEÖrnekleri çok sayıda öznitelik barındıran veri kümelerinin sınıflandırılması uzay ve zaman olarak yüksek maliyetlidir. Çok boyutluluğun laneti olarak bilinen bu problemi çözmek için öznitelik seçimi ve öznitelik çıkarımı yöntemlerinden oluşan boyut indirgeme yöntemleri geliştirilmiştir. Bu çalışmada, her bir örnekte bulunan özniteliklerin sınıflara olan bileşke etkilerini kullanarak boyut indirgeme sağlayan soyut öznitelik çıkarım yöntemi ile sınıf bilgisini kullanarak boyut indirgemeyi gerçekleştiren diğer yaygın yöntemlerin sınıflandırma performansına olan etkileri karşılaştırılmıştır. Yöntemleri karşılaştırmak için çok boyutlu öznitelikleriyle bilinen iki standart metin veri kümesi kullanılmıştır. Seçilen yöntemlerle örneklerinin boyutları indirgenen veri kümeleri, beş farklı türde sınıflandırma algoritmasına tabi tutulmuştur. Elde edilen sonuçlar, boyut indirgeme için soyut öznitelik çıkarım yöntemi kullanıldığında diğer yöntemlere nazaran çok daha yüksek sınıflandırma başarımı elde edildiğini göstermektedir. ABSTRACT Classification of datasets that contain samples with numerous features is known as a costly process in time and space.In order to overcome this problem, dimensionality reduction techniques like feature selection and feature extraction are proposed in literature. In this paper, we compare the impacts of abstract feature extraction method and other popular techniques that use class labels for dimensionality reduction on classification performances. For evaluation, we utilize two standard text datasets having high dimensional samples. We compare the impacts of selected methods on performance by applying them on selected datasets and testing on five different classifiers with different design approaches. Results show that using abstract feature extraction method for dimensionality reduction produces much better classification performance, when compared with other selected methods.

show abstract

Section: öZnitelik Seçim Yöntemleriunclassified

Comparing the impacts of dimension reduction methods that use class labels on text classification

Biricik

2012

2012 20th Signal Processing and Communications Applications Conference (SIU)

View full text Add to dashboard Cite

show abstract

“…Feature extraction method creates a subset of new features by combination of existing features, while feature selection method chooses a subset of all features that is more informative (more relevant to the target class). Both are utilized as a preprocessing stage for classification to improve its accuracy, reduce memory space and processing time required for classification and to reduce the cost of gathering data, noting that irrelevant features could be represented as a noisy feature that could decrease the accuracy of the classification process [13].…”

Section: Dimensionality Reductionmentioning

confidence: 99%

“…On the other hand feature selection methods are divided into two types: univariate and multivariate feature selection methods. Univariate methods evaluate the relevance of features individually where it provides the discriminatory power (ability of the feature to discriminate between different classes) of the feature [13], each feature is considered individually at a time. An example of univariate methods is CHI Square method and Mutual information MI method [16] which measures the dependency between each feature f and the target class c Where f and c are independent if: P(f, c) = P(f) P(c).…”

Section: Dimensionality Reductionmentioning

confidence: 99%

“…MI is used later in [13] in another form to improve the discrimination between confusable classes (enlarge the separation between the correct class and other competing class). Such that for a feature X, the discriminating information for class Ci versus class Cj is measured as follows in formula (6) I i (X) = ∫P i (X) log P i (X)/P j (X) dX (6) Where Pi(X) and Pj(X) are the probability density functions of class Ci and class Cj for sample X.…”

Section: Dimensionality Reductionmentioning

confidence: 99%

See 1 more Smart Citation

Pattern-based Data-Classification Technique

Hasanen

Fahmy

2010

The International Conference on Electrical Engineering

View full text Add to dashboard Cite

This paper presents a novel model of a supervised machine learning approach for classification of a dataset. The model depends on a feature selection (dimensionality reduction) method that is based on pattern-based subspace clustering technique. Then this clustering technique is applied to the dataset to perform the classification of the data. This approach is a non-statistical technique that supports most of the requirements that have been discussed recently like dimensionality reduction using multivariate feature selection method, threshold independence and handling of missing data. The approach tends to handle these requirements altogether which not the case in other classification models as discussed in this paper. Another distinguishing point in this model is its dependence on the variation of the values of relative features among different objects. Experimental results on synthetic and real datasets show that approach outperforms the existing methods in both efficiency and effectiveness.

show abstract

“…Recently, a large number of scholars have studied text classification. Traditional classification algorithm models include K -nearest neighbor (KNN) [ 11 ], naive Bayes (NB) [ 12 ], and support vector machine (SVM) [ 13 ]. These models have good classification results and have been widely used.…”

Section: Introductionmentioning

confidence: 99%

Short-Text Classification Detector: A Bert-Based Mental Approach

Ding

Dou

et al. 2022

Computational Intelligence and Neuroscience

View full text Add to dashboard Cite

With the continuous development of the Internet, social media based on short text has become popular. However, the sparsity and shortness of essays will restrict the accuracy of text classification. Therefore, based on the Bert model, we capture the mental feature of reviewers and apply them for short text classification to improve its classification accuracy. Specifically, we construct a model text at the language level and fine tune the model to better embed mental features. To verify the accuracy of this method, we compare a variety of machine learning methods, such as support vector machine, convolution neural networks, and recurrent neural networks. The results show the following: (1) Through feature comparison, it is found that mental features can significantly improve the accuracy of short text classification. (2) Combining mental features and text as input vectors can provide more classification accuracy than separating them as two independent vectors. (3) Through model comparison, it can be found that Bert model can integrate mental features and short text. Bert can better capture mental features to improve the accuracy of classification results. This will help to promote the development of short text classification.

show abstract

Discrimination-Based Feature Selection for Multinomial Naïve Bayes Text Classification

Cited by 6 publications

References 6 publications

Comparing the impacts of dimension reduction methods that use class labels on text classification

Comparing the impacts of dimension reduction methods that use class labels on text classification

Pattern-based Data-Classification Technique

Short-Text Classification Detector: A Bert-Based Mental Approach

Contact Info

Product

Resources

About