BACKGROUND
ChatGPT and other large language models (LLMs) are trained on extensive text data; they learn patterns and associations within the training text without an inherent understanding of underlying causal mechanisms. Establishing causation necessitates controlled experiments and observations, and as of May 2024, ChatGPT lacks access to experimental data and the capacity to learn analytical models from data. Recent advancements from OpenAI enable the creation of custom Generative Pre-trained Transformers (GPT) models using their GPT Builder. These custom GPTs can be tailored with causal knowledge from Causal Bayesian Networks (CBN), thereby producing a knowledgeable, health-recommender system with causal expertise. This system is not only easily accessible for patients, clinicians and other users, but also has the potential to significantly improve healthcare outcomes.
OBJECTIVE
This paper presents a practical solution –a custom GPT model integrated with Causal Bayesian Networks (CBNs) informed by Authoritative Medical Ontologies (AMOs) as prior foundational knowledge. AMOs are robust biomedical ontologies that encapsulate the expert knowledge of their creators. By utilizing structured information contained within these ontologies, we can generate an informed CBN which can be used to improve a custom GPT as a health recommender system. These enhanced GPTs offer profound insights into cause-and-effect among co-morbid symptoms within the disease domain.
METHODS
To demonstrate our recommender system, we learn a CBN using NIMH data for patients of Alzheimer’s Disease and augment this with causal knowledge from the International Classification of Diseases Version 10 Clinical Modification (ICD-10-CM). We compute the CBN using the Max-Min Hill-Climbing (MMHC) algorithm. We generate two separate CBNs using MMHC to compare predictive accuracies between a baseline and a CBN modified with causal mechanisms from ICD-10-CM. Our previous research using this method has resulted in a modified CBN that reflects the causal claims in the AMOs and agrees with both the AMOs and the observational dataset. With this causal model, we build a custom GPT using OpenAI’s GPT Builder.
RESULTS
The custom GPT contains both potentially causal and correlations among symptoms, as well as conditional probabilities for these relationships. The GPT will also contain knowledge of causal mechanisms from ICD-10-CM, extended information regarding symptom variables, and references to existing literature regarding comorbid Alzheimer’s Disease symptoms. Furthermore, because the modified CBN model establishes potentially causal relationships among symptoms which can be verified in existing epidemiological research, we can verify that the custom GPT also establishes these causal relationships. This creates a GPT that agrees with the modified CBN, which is a representation of existing subject matter expertise in the disease domain.
CONCLUSIONS
To obtain our CBN, we’ve used a previous methodology which obtains ordered variable pairs from authoritative ontology ICD-10-CM as prior expertise. In our past research, a CBN that is learned using MMHC can be improved significantly by considering prior sources of knowledge, if the algorithm is modified appropriately. This source of prior knowledge can be validated in existing literature as a sequence of events in AD progression, specific causal mechanisms among comorbid symptoms, and conditional probabilities from a Bayesian Network. The resulting modified network provides insight into the causal relationships expressed in the AD data and takes advantage of the expertise and knowledge contained in the AMOs. Since inferring causality from a CBN does not exist in a vacuum, the relationships within the CBN, regardless of the strength of the conditional probabilities, must be explored further. A custom GPT extends the causal knowledge within a CBN with general knowledge of symptoms and diseases, providing a tool capable of suggesting causal inference based on the analysis of real patient data. With a LLM, uncertainty is present but reasoned with, as it is present but reasoned with in the prior, posterior, and likelihood in Bayes Theorem. Moving forward, we would like to explore using ChatGPT to produce ordered variable pairs and to automate the validation of potentially causal information.