Prediction of oxygen requirement in patients with COVID-19 using a pre-trained chest radiograph xAI model: efficient development of auditable risk prediction models via a fine-tuning approach

Chung, Joowon; Kim, Doyun; Choi, Jongmun; Yune, Sehyo; Song, Kyungdoo; Kim, Seonkyoung; Chua, Michelle; Succi, Marc D.; Conklin, John; Longo, Maria Gabriela Figueiró; Ackman, Jeanne B.; Petranović, Milena; Lev, Michael H.; Do, Synho

doi:10.1038/s41598-022-24721-5

Cited by 15 publications

(12 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Given the important ongoing discourse [3][4][5][6][7][8] surrounding bias in the clinical setting and bias in artificial intelligence, we believe our analysis of ChatGPT's performance based on the age and gender of patients represents an important touchpoint in both discussions. [21][22][23][24][25] While we did not find that age or gender is a significant predictor of accuracy, we note that our vignettes represent classic presentations of disease, and that atypical presentations may generate different biases.…”

Section: Discussionmentioning

confidence: 99%

“…Despite its relative infancy, artificial intelligence (AI) is transforming healthcare, with current uses including workflow triage, predictive models of utilization, labeling and interpretation of radiographic images, patient support via interactive chatbots, communication aids for non-English speaking patients, and more. [1][2][3][4][5][6][7][8] Yet, all of these use cases are limited to a specific part of the clinical workflow and do not provide longitudinal patient or clinician support. An under-explored use of AI in medicine is predicting and synthesizing patient diagnoses, treatment plans, and outcomes.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Assessing the Utility of ChatGPT Throughout the Entire Clinical Workflow

Rao

Pang

Kim

et al. 2023

Preprint

Self Cite

111

View full text Add to dashboard Cite

IMPORTANCE: Large language model (LLM) artificial intelligence (AI) chatbots direct the power of large training datasets towards successive, related tasks, as opposed to single-ask tasks, for which AI already achieves impressive performance. The capacity of LLMs to assist in the full scope of iterative clinical reasoning via successive prompting, in effect acting as virtual physicians, has not yet been evaluated. OBJECTIVE: To evaluate ChatGPT′s capacity for ongoing clinical decision support via its performance on standardized clinical vignettes. DESIGN: We inputted all 36 published clinical vignettes from the Merck Sharpe & Dohme (MSD) Clinical Manual into ChatGPT and compared accuracy on differential diagnoses, diagnostic testing, final diagnosis, and management based on patient age, gender, and case acuity. SETTING: ChatGPT, a publicly available LLM PARTICIPANTS: Clinical vignettes featured hypothetical patients with a variety of age and gender identities, and a range of Emergency Severity Indices (ESIs) based on initial clinical presentation. EXPOSURES: MSD Clinical Manual vignettes MAIN OUTCOMES AND MEASURES: We measured the proportion of correct responses to the questions posed within the clinical vignettes tested. RESULTS: ChatGPT achieved 71.7% (95% CI, 69.3% to 74.1%) accuracy overall across all 36 clinical vignettes. The LLM demonstrated the highest performance in making a final diagnosis with an accuracy of 76.9% (95% CI, 67.8% to 86.1%), and the lowest performance in generating an initial differential diagnosis with an accuracy of 60.3% (95% CI, 54.2% to 66.6%). Compared to answering questions about general medical knowledge, ChatGPT demonstrated inferior performance on differential diagnosis (β=-15.8%, p<0.001) and clinical management (β=-7.4%, p=0.02) type questions. CONCLUSIONS AND RELEVANCE: ChatGPT achieves impressive accuracy in clinical decision making, with particular strengths emerging as it has more clinical information at its disposal.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Assessing the Utility of ChatGPT Throughout the Entire Clinical Workflow

Rao

Pang

Kim

et al. 2023

Preprint

Self Cite

111

View full text Add to dashboard Cite

show abstract

“…In the context of medical platforms, explainable AI (XAI) is crucial 61 , particularly when it comes to the prediction of myocardial infarction (MI) probability using survey data. Transparency and interpretability 62 are crucial in the healthcare industry since decisions based on AI-driven models may have far-reaching effects 32,34 . In addition to improving the credibility and dependability of predictive models, XAI provides healthcare professionals with the knowledge required to comprehend the logic behind AI-generated predictions.…”

Section: Interpretability Analysismentioning

confidence: 99%

“…The interpretability and explainability of artificial intelligence (AI) models are critical in the medical arena since healthcare practitioners demand insights into the model’s decision-making process 32,33 . Deep learning models, particularly neural networks, have been criticized for their “black-box” nature, which makes it difficult to grasp the logic behind the predictions made by these approaches 34,35,36,37,38,39,40 . This study intends to overcome these important issues by proposing reliable, explainable, and thus more transparent methods for exploring cutting-edge deep-learning techniques for medical research and practice.…”

Section: Introductionmentioning

confidence: 99%

Identification of Myocardial Infarction (MI) Probability from Imbalanced Medical Survey Data: An Artificial Neural Network (ANN) with Explainable AI (XAI) Insights

Akter,

Pias

et al. 2024

Preprint

View full text Add to dashboard Cite

In the healthcare industry, many artificial intelligence (AI) models have attempted to overcome bias from class imbalances while also maintaining high results. Firstly, when utilizing a large number of unbalanced samples, current AI models and related research have failed to balance specificity and sensitivity – a problem that can undermine the reliability of medical research. Secondly, no reliable method for obtaining detailed interpretability has been put forth when addressing large numbers of input features. The present research addresses these two critical research gaps with a proposed lightweight Artificial Neural Network (ANN) model. Using 43 input features from the 2021 Behavioral Risk Factor Surveillance System (BRFSS) dataset, the proposed model outperforms prior models in producing balanced outcomes from markedly unbalanced large survey data. The efficacy of this proposed ANN model is attributed to its simplified design, which reduces processing demands, and its resilience in identifying the probability of myocardial infarction (MI). This is demonstrated by its 80% specificity and 77% sensitivity, and is substantiated by a Receiver Operating Characteristic Area Under the Curve (AUC) of 0.87. The outcomes across the scopes of each specified data domain were also separately represented, thus demonstrating the proposed model’s robust sensitivity. The interpretability of the model, as measured by Shapley values, reveals substantial correlations between myocardial infarction (MI) and its risk factors, including long-term medical conditions, socio-demographic factors, personal health habits, economic and social status, healthcare availability and affordability, as well as impairment statuses, providing valuable insights for improved cardiovascular risk assessment and personalized healthcare strategies.

show abstract

“…Between the first COVID-19 diagnosis in France and the availability of these templates, French radiologists wrote their reports according to their own experience in thoracic imaging and the objective abnormalities on chest CT. So far, most studies using artificial intelligence have applied a supervised methodology on medical images in order to perform patients’ triage, distinguishing common pneumonitis from COVID-19 lung disease, assessing the severity of the COVID-19 lung disease, or anticipating oxygen requirement thanks to classical machine-learning or deep-learning algorithms [ 12 – 16 ]. Regarding NLP application, Li et al trained supervised machine-learning models to automatically identify CT reports with the diagnosis of acute appendicitis, diverticulitis, and bowel obstruction and secondarily applied those models on a large population to investigate the impact of the COVID-19 pandemic on their detection in emergency departments [ 17 ].…”

Section: Introductionmentioning

confidence: 99%

Using the Textual Content of Radiological Reports to Detect Emerging Diseases: A Proof-of-Concept Study of COVID-19

Crombé,

Lecomte,

Seux

et al. 2024

J Digit Imaging. Inform. med.

View full text Add to dashboard Cite

Changes in the content of radiological reports at population level could detect emerging diseases. Herein, we developed a method to quantify similarities in consecutive temporal groupings of radiological reports using natural language processing, and we investigated whether appearance of dissimilarities between consecutive periods correlated with the beginning of the COVID-19 pandemic in France. CT reports from 67,368 consecutive adults across 62 emergency departments throughout France between October 2019 and March 2020 were collected. Reports were vectorized using time frequency–inverse document frequency (TF-IDF) analysis on one-grams. For each successive 2-week period, we performed unsupervised clustering of the reports based on TF-IDF values and partition-around-medoids. Next, we assessed the similarities between this clustering and a clustering from two weeks before according to the average adjusted Rand index (AARI). Statistical analyses included (1) cross-correlation functions (CCFs) with the number of positive SARS-CoV-2 tests and advanced sanitary index for flu syndromes (ASI-flu, from open-source dataset), and (2) linear regressions of time series at different lags to understand the variations of AARI over time. Overall, 13,235 chest CT reports were analyzed. AARI was correlated with ASI-flu at lag = + 1, + 5, and + 6 weeks (P = 0.0454, 0.0121, and 0.0042, respectively) and with SARS-CoV-2 positive tests at lag = − 1 and 0 week (P = 0.0057 and 0.0001, respectively). In the best fit, AARI correlated with the ASI-flu with a lag of 2 weeks (P = 0.0026), SARS-CoV-2-positive tests in the same week (P < 0.0001) and their interaction (P < 0.0001) (adjusted R2 = 0.921). Thus, our method enables the automatic monitoring of changes in radiological reports and could help capturing disease emergence.

show abstract

Prediction of oxygen requirement in patients with COVID-19 using a pre-trained chest radiograph xAI model: efficient development of auditable risk prediction models via a fine-tuning approach

Cited by 15 publications

References 26 publications

Assessing the Utility of ChatGPT Throughout the Entire Clinical Workflow

Assessing the Utility of ChatGPT Throughout the Entire Clinical Workflow

Identification of Myocardial Infarction (MI) Probability from Imbalanced Medical Survey Data: An Artificial Neural Network (ANN) with Explainable AI (XAI) Insights

Using the Textual Content of Radiological Reports to Detect Emerging Diseases: A Proof-of-Concept Study of COVID-19

Contact Info

Product

Resources

About