The increase in the number of texts as digital documents from numerous sources such as customer reviews, news, and social media has made text categorization a crucial in order to be able to manage the enormous amount of data. The high dimensional nature of these texts requires a preliminary feature selection task to reduce the feature space with a potential increase in the prediction accuracy. In this study, we developed an ensemble feature selection method, namely majority vote rank allocation, was developed for Turkish text categorization purposes. The method uses a majority voting ensemble strategy in combination with a rank allocation approach to combine weak filters such as information gain, symmetric uncertainty, relief and correlation-based feature selection. Thus, the proposed method measures the quality of the features among all features with the majority votes of the filters and ranking allocation. The feature selection efficacy of the method was tested on two datasets, one from the literature and a newly collected dataset. The effect of the obtained features on the classification prediction performance was evaluated on top of the naive bayes, support vector machine, J48 and random forests algorithms. It was empirically observed that the developed method improved the prediction accuracies of the classifiers compared to the mentioned filters. The statistical significance of the experimental results were also validated with the use of a two-way Analysis of Variance test.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.