Classification is a process that automatically places text documents into a text based on the content of the text. Classification can help us classify many text documents that have been published, with the classification, these text documents can be reached easily and quickly. Feature selection can be used to improve the performance of text classification in terms of learning speed and effectiveness. In the Chi-Square feature selection experiment, a 1% threshold combination with a parameter value of k=6 is the combination chosen to be the best model. In testing the new data, the K-Nearest Neighbor model by selecting the Chi-Square feature produces precision performance, recall, F1-Score, and accuracy respectively, namely 85%, 83.3%, 88.2%, and 92.3%. In the Gini Index feature selection experiment,1% threshold combination with a parameter value of k=4 is the combination chosen to be the best model. This threshold selects about 31 features with the highest Gini Index value. In testing the new data, the K-Nearest Neighbor model by selecting the Gini Index feature produces precision performance, recall, F1-Score, and accuracy respectively, namely 81.2%, 80.3%, 81.6%, and 86.6%.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.