It is widely considered that approximately 10% of the population suffers from type 2 diabetes. Unfortunately, the impact of this disease is underestimated. Patient's mortality often occurs due to complications caused by the disease and not the disease itself. Many techniques utilized in modeling diseases are often in the form of a “black box” where the internal workings and complexities are extremely difficult to understand, both from practitioners' and patients' perspective. In this work, we address this issue and present an informative model/pattern, known as a “latent phenotype,” with an aim to capture the complexities of the associated complications' over time. We further extend this idea by using a combination of temporal association rule mining and unsupervised learning in order to find explainable subgroups of patients with more personalized prediction. Our extensive findings show how uncovering the latent phenotype aids in distinguishing the disparities among subgroups of patients based on their complications patterns. We gain insight into how best to enhance the prediction performance and reduce bias in the models applied using uncertainty in the patients'
data.
Comorbidities such as hypertension and lipid metabolism are often associated in diseases such as diabetes, and the early prediction of these is of great value when trying to manage progression. This is the start of a project to model multiple comorbidities in diabetes using dynamic Bayesian networks with latent variables in order to stratify patient cohorts. In this paper, we demonstrate some initial results on a dataset where the class imbalance problem poses an issue due to the rare occurrence of different individual comorbidities on a visit-by-visit basis. This is dealt with using a bootstrap technique that has been specifically designed for longitudinal data where the occurrence of the positive class occurs far less than the negative.
Clinicians predict disease and related complications based on prior knowledge and each individual patient's clinical history. The prediction process is complex due to the existence of unmeasured risk factors, the unexpected development of complications and varying responses of patients to disease over time. Exploiting these unmeasured risk factors (hidden variables) can improve the modeling of disease progression and thus enables clinicians to focus on early diagnosis and treatment of unexpected conditions. However, the overuse of hidden variables can lead to complex models that can overfit and are not well understood (being 'black box' in nature). Identifying and understanding groups of patients with similar disease profiles (based on discovered hidden variables) makes it possible to better understand disease progression in different patients while improving prediction. We explore the use of a stepwise method for incrementally identifying hidden variables based on the Induction Causation (IC*) algorithm. We exploit Dynamic Time Warping and hierarchical clustering to cluster patients based upon these hidden variables to uncover their meaning with respect to the complications of Type 2 Diabetes Mellitus patients. Our results reveal that inferring a small number of targeted hidden variables and using them to cluster patients not only leads to an improvement in the prediction accuracy but also assists the explanation of different discovered sub-groups.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.