Purpose:
This paper compares five supervised learning algorithms (support vector machines, k-nearest neighbor, decision tree, random forest, and AdaBoost) for predicting heart disease and examines the impact of normalization and GridSearch hyper-parameter tuning on model performance.
Methods:
The study utilizes the Cleveland database from the University of California-Irvine (UCI) repository, comprising data on 918 instances of heart disease patients with 12 attributes. Eleven attributes serve as predictors, while one attribute represents the target class. Models are built and tested using this dataset.
Results:
Comparing the algorithm performances with existing literature, accuracies range from 89.13–91.85%. AdaBoost exhibits the highest performance, whereas the decision tree performs the least effectively. Results surpass those reported in the literature. Normalization improves prediction performance by 17% for Support Vector Machines (SVM) and 14% for k-nearest neighbor (kNN). SVM does not benefit from GridSearch, while GridSearch enhances the decision tree and AdaBoost by 7% and 4% respectively. Normalization combined with GridSearch improves kNN and random forest by 2–3%.
Conclusion:
This study compares supervised learning algorithms for heart disease prediction. AdaBoost emerges as the top-performing algorithm, while the decision tree performs relatively poorly. The findings surpass those in the literature. Normalization significantly improves performance for SVM and kNN, while GridSearch enhances the decision tree and AdaBoost. Combined, normalization and GridSearch yield performance improvements for kNN and random forest. These results contribute to the field of heart disease prediction, offering valuable insights for algorithm selection and guiding future research.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.