Prediction of Clinical Disease with AI-Based Multiclass Classification Using Naïve Bayes and Random Forest Classifier

Jackins, V.; Vimal, S.; Kaliappan, M.; Lee, Mi Young

doi:10.1007/978-3-030-70296-0_63

Cited by 2 publications

(2 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In the healthcare industry vast amounts of data such as patient demographics, dietary patterns, medical history, lab test results and even medical imaging records [1] are constantly being produced through various methods such as sensors, surveys, AHS (advance healthcare systems), cameras, mobile applications and online applications [2]. These data can be used for early detection of diseases which can aid in improving the survival rate of patients and overall better quality of life.…”

Section: Introductionmentioning

confidence: 99%

Framework for Benefit-Based Multiclass Classification of Diseases

Sooklal,

Hosein

2024

Preprint

View full text Add to dashboard Cite

Purpose: Health datasets typically comprise of data that are heavily skewed towards the healthy class, thus resulting in classifiers erring towards this majority class. Due to this imbalance of data, traditional performance metrics, such as accuracy, are not appropriate for evaluating the performance of classifiers with the minority class (disease-affected). In addition, classifiers are trained under the assumption that the costs or benefits associated with different decision outcomes are equal. However, this is usually not the case with health data since there are different benefits/costs associated with the correct/incorrect identification of disease affected/unhealthy persons rather than healthy individuals. In this paper we address these problems by examining benefits/costs both when training and evaluating the performance of classifiers. Furthermore,we focus on multiclass classification where the outcome can be one of three or more options. Methods: We propose modifications to the Naive Bayes and Logistic Regression algorithms to incorporate costs and benefits when training for the multiclass scenario, as well as compare these to a recently proposed algorithm in the field, hierarchical cost-sensitive kernel logistic regression, and also an adapted hierarchical approach with our cost-benefit based logistic regression model. Wedemonstrate the effectiveness of all approaches for fetal health classification, vertebral column classification and hepatitis C/fibrosis/cirrhosis prediction. Results: Our proposed multiclass Logistic Regression algorithm outperformed all other algorithms, improving performance with the more critical classes. Conclusion: Our proposed multiclass Logistic Regression algorithm is robust and suitable for cases where costs and benefits of the various decision outcomes are important.

show abstract

Section: Introductionmentioning

confidence: 99%

Framework for Benefit-Based Multiclass Classification of Diseases

Sooklal,

Hosein

2024

Preprint

View full text Add to dashboard Cite

show abstract

“…In the healthcare industry vast amounts of data such as patient demographics, dietary patterns, medical history, lab test results and even medical imaging records [8] are constantly being produced through various methods such as sensors, surveys, AHS (advance healthcare systems), cameras, mobile applications and online applications [2]. This data can be used for early detection of diseases which can aid in improving the survival rate of patients and overall better quality of life.…”

Section: Introductionmentioning

confidence: 99%

Framework for Benefit-Based Multiclass Classification

Sooklal

Hosein

2022

Preprint

View full text Add to dashboard Cite

Health datasets typically comprise of data that are heavily skewed towards the healthy class, thus resulting in classifiers being biased towards this majority class. Due to this imbalance of data, traditional performance metrics, such as accuracy, are not appropriate for evaluating the performance of classifiers with the minority class (disease-affected/unhealthy individuals). In addition, classifiers are trained under the assumption that the costs or benefits associated with different decision outcomes are equal. However, this is usually not the case with health data since it is more important to identify disease affected/unhealthy persons rather than healthy individuals. In this paper we address these problems by examining benefits/costs when evaluating the performance of classifiers. Furthermore, we focus on multiclass classification where the outcome can be one of three or more options. We propose modifications to the Naive Bayes and Logistic Regression algorithms to incorporate costs and benefits for the multiclass scenario as well as compare these to an existing algorithm, hierarchical cost-sensitive kernel logistic regression, and also an adapted hierarchical approach with our cost-benefit based logistic regression model. We demonstrate the effectiveness of all approaches for fetal health classification but the proposed approaches can be applied to any imbalance dataset where benefits and costs are important.

show abstract

Prediction of Clinical Disease with AI-Based Multiclass Classification Using Naïve Bayes and Random Forest Classifier

Cited by 2 publications

References 6 publications

Framework for Benefit-Based Multiclass Classification of Diseases

Framework for Benefit-Based Multiclass Classification of Diseases

Framework for Benefit-Based Multiclass Classification

Contact Info

Product

Resources

About