Machine learning techniques are widely used in medical decision support systems. Medical diagnosis helps to obtain different features representing the different variations of the disease. With the help of different diagnostic procedures, it is likely to have relevant, irrelevant and redundant features to represent a disease. Redundant features contribute to the wrong classification of the disease. Therefore, removing the redundant features reduces the size of the data and computation complexity. Identifying a good feature subset for effective classification is a non-trivial task. This requires an exhaustive search over the sample space of the dataset. The main objective of this paper is to use a metaheuristic algorithm to determine the optimal feature subset with improved classification accuracy in cardiovascular disease diagnosis. Swarm intelligence based Artificial Bee Colony (ABC) algorithm is used to find the best features in the disease identification. To evaluate the fitness of ABC, Support Vector Machine (SVM) classification is used. The performance of the proposed algorithm is validated against the Cleveland Heart disease dataset taken from the VCI machine learning repository. The experimental results show that, ABC-SVM performs better than Feature selection with reverse ranking. The results also show that, the proposed method obtained good classification accuracy with only seven features.
Abstract:A new stream of research privacy preserving data mining emerged due to the recent advances in data mining, Internet and security technologies. Data sharing among organizations considered to be useful which offer mutual benefit for business growth. Preserving the privacy of shared data for clustering was considered as the most challenging problem. To overcome the problem, the data owner published the data by random modification of the original data in certain way to disguise the sensitive information while preserving the particular data property. Data transformation techniques played a vital role to preserve privacy in data mining. We put forward an effective approach which defeats the problem of addressing privacy of confidential categorical data in clustering. A set of hybrid data transformations are introduced (HDTTR and HDTSR) and the effectiveness of the approach has been analyzed. A complete analysis of the proposed approach and a formal study of the problem have been done. Our proposed approach illustrates the effectiveness of clustering of sensitive categorical data before and after the transformation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.