Purpose Owing to the huge volume of documents available on the internet, text classification becomes a necessary task to handle these documents. To achieve optimal text classification results, feature selection, an important stage, is used to curtail the dimensionality of text documents by choosing suitable features. The main purpose of this research work is to classify the personal computer documents based on their content. Design/methodology/approach This paper proposes a new algorithm for feature selection based on artificial bee colony (ABCFS) to enhance the text classification accuracy. The proposed algorithm (ABCFS) is scrutinized with the real and benchmark data sets, which is contrary to the other existing feature selection approaches such as information gain and χ2 statistic. To justify the efficiency of the proposed algorithm, the support vector machine (SVM) and improved SVM classifier are used in this paper. Findings The experiment was conducted on real and benchmark data sets. The real data set was collected in the form of documents that were stored in the personal computer, and the benchmark data set was collected from Reuters and 20 Newsgroups corpus. The results prove the performance of the proposed feature selection algorithm by enhancing the text document classification accuracy. Originality/value This paper proposes a new ABCFS algorithm for feature selection, evaluates the efficiency of the ABCFS algorithm and improves the support vector machine. In this paper, the ABCFS algorithm is used to select the features from text (unstructured) documents. Although, there is no text feature selection algorithm in the existing work, the ABCFS algorithm is used to select the data (structured) features. The proposed algorithm will classify the documents automatically based on their content.
This article describes how privacy preserving data mining has become one of the most important and interesting research directions in data mining. With the help of data mining techniques, people can extract hidden information and discover patterns and relationships between the data items. In most of the situations, the extracted knowledge contains sensitive information about individuals and organizations. Moreover, this sensitive information can be misused for various purposes which violate the individual's privacy. Association rules frequently predetermine significant target marketing information about a business. Significant association rules provide knowledge to the data miner as they effectively summarize the data, while uncovering any hidden relations among items that hold in the data. Association rule hiding techniques are used for protecting the knowledge extracted by the sensitive association rules during the process of association rule mining. Association rule hiding refers to the process of modifying the original database in such a way that certain sensitive association rules disappear without seriously affecting the data and the non-sensitive rules. In this article, two new hiding techniques are proposed namely hiding technique based on genetic algorithm (HGA) and dummy items creation (DIC) technique. Hiding technique based on genetic algorithm is used for hiding sensitive association rules and the dummy items creation technique hides the sensitive rules as well as it creates dummy items for the modified sensitive items. Experimental results show the performance of the proposed techniques.
No abstract
Association rule mining is an important and widely used data mining technique. It is used to retrieve highly related objects in a database based on the occurrence. Recently, utility-based association rules were proposed to consider significant factors of the object. The main objective of this research work is to retrieve high utility association rules from a database using cockroach swarm optimization algorithm. So far, in the literature, no optimization algorithm was proposed in utility-based association rule mining. In this research work, CSOUAR (cockroach swarm optimization for high utility association rule mining) algorithm was proposed to generate utility association rules. CSOUAR algorithm is based on three behaviours of cockroach: chase-swarming, dispersing, and ruthless. To analyse the performance of CSOUAR, an improved particle swarm optimization (PSO-UAR), animal migration optimization (AMO-UAR), bees swarm optimisation (BSO-UAR), and penguins search optimisation (peSO-UAR) are also proposed in this work.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.