Disease diagnosis is of the utmost importance in providing appropriate medical treatment. Genetic diseases, such as hemoglobinopathies and thalassemia, need to be diagnosed accurately and on time. Though Hb variants are diagnosed using a HPLC-based hemoglobin typing machine. appropriate interpretation of the data obtained is still necessary and this requires trained professionals. Machine learning helps to interpret the obtained data and in predicting the type of Hb variants, thus reducing the workload of health professionals. In this study, the obtained data are classified using the following classifiers, namely logistic regression, support vector classifier (SVC), k-nearest neighbor (KNN), Gaussian naï ve bayes, perceptron classifier, linear SVC, stochastic gradient descent, decision tree, random forest, and multi-layer perceptron. The pre-processing, visualization and the classification steps were implemented using Python 2.7 on an Intel Core i5 computer. The performance of each classifier was then tested by initially creating a confusion matrix. Indices including "precision," "recall," and "f1-score" were used to quantify the quality of each model. KNN, decision tree, and random forest show better classification results in comparison to the other classifiers. With a precision of 93.89%, recall of 92.78%, and f1-score of 93.33%, the decision tree and random forest classifiers prove to be better classifiers in predicting the Hb variants with a higher accuracy rate.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.