With the growth of online information and sudden expansion in the number of electronic documents provided on websites and in electronic libraries, there is difficulty in categorizing text documents. Therefore, a rule-based approach is a solution to this problem; the purpose of this study is to classify documents by using a rule-based. This paper deals with the rule-based approach with the embedding technique for a document to vector (doc2vec) files. An experiment was performed on two data sets Reuters-21578 and the 20 Newsgroups to classify the top ten categories of these data sets by using a document to vector rule-based (D2vecRule). Finally, this method provided us a good classification result according to the F-measures and implementation time metrics. In conclusion, it was observed that our algorithm document to vector rule-based (D2vecRule) was good when compared with other algorithms such as JRip, One R, and ZeroR applied to the same Reuters-21578 dataset.
With the advancing growth of the World Wide Web (WWW) and the expanding availability of electronic text documents, the automatic assignment of text classification (ATC) has become more important in sorting out information and knowledge. One of the most crucial tasks that should be carried out is document representation using word embedding and Rule-Based methodologies. As a result, this, along with their modeling methods, has become an essential step to improve neural language processing for text classification. In this paper, a systematic mapping study is a way to survey all the primary studies on word embedding to rule-based and machine learning of automatic text classification. The search procedure identifies 20 articles as relevant to answer our research questions. This study maps what is currently known about word embedding in rule-based text classification (TC). The result shows that the research is concentrated on some main areas, mainly in social sciences, shopping products classification, digital libraries, and spam filtering. The present paper contributes to the available literature by summarizing all research in the field of TC and it can be beneficial to other researchers and specialists in order to sort information.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.