2023
DOI: 10.20944/preprints202302.0117.v1
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Evaluation of Synthetic Categorical Data Generation Techniques for Predicting Cardiovascular Diseases and Post-hoc Interpretability of the Risk Factors

Abstract: Machine Learning (ML) methods have become important to enhance the performance of decision-support predictive models. However, class imbalance is one of the main challenges for developing ML models, because it limits the generalization of these models, and biases the learning algorithms. In this paper, we consider oversampling methods for generating synthetic categorical clinical data aiming to improve the predictive performance in ML models, and the identification of risk factors for cardiovascular diseases (… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
2

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(2 citation statements)
references
References 49 publications
0
2
0
Order By: Relevance
“…Hence, SHAP shows the contribution of each feature to the final prediction and can support the interpretability of data-driven models. The SHAP has shown great ability to provide model interpretability in multiple clinical studies [60][61][62].…”
Section: Post-hoc Interpretability Methodsmentioning
confidence: 99%
“…Hence, SHAP shows the contribution of each feature to the final prediction and can support the interpretability of data-driven models. The SHAP has shown great ability to provide model interpretability in multiple clinical studies [60][61][62].…”
Section: Post-hoc Interpretability Methodsmentioning
confidence: 99%
“…Recently, the novel conditional tabular GAN (CTGAN) has shown excellent performance for addressing the main issues in the generation of mixed-type tabular data [26]. In the clinical setting, different authors have used CTGAN variants [27,28] to create synthetic data that conserve underlying distribution from original data by aiming to enhance the results in predictive tasks.…”
Section: Introductionmentioning
confidence: 99%