Text classification is one of the key issues in text development research, where documents are classified based on information under supervision. Since there is a considerable number of text classification algorithms, it is currently necessary to compile an overview list of them in order to simplify the orientation in the classification tools that are available at the moment. Many text representation schemes and classification/learning algorithms used to classify text documents into predefined categories can be found in the literature, but some of them require detailed analysis and unleashed potential. The purpose of this study is to provide an overview of different text presentation schemes and a comparison of different classifiers that are used to classify text documents into predefined categories. During the study, a comparison method was used as part of the methodology – modern classification approaches based on criteria, algorithms used and time complexity were compared, as well as methods of analysis, modelling and combination. As a result of the study, several algorithms or combinations of algorithms have been proposed for automatic classification of documents as hybrid approaches. The SVM (Support Vector Machine) classifier was recognised as one of the most effective text classification methods when comparing guided machine learning algorithms. It was concluded that SVM captures the inherent characteristics of the data and embeds the structural risk minimisation (SRM) principle, which minimises the upper bound of the generalisation error better than the empirical risk minimisation principle.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.