Physicians make critical time-constrained decisions every day. Clinical predictive models can help physicians and administrators make decisions by forecasting clinical and operational events. Existing structured data-based clinical predictive models have limited use in everyday practice owing to complexity in data processing, as well as model development and deployment1–3. Here we show that unstructured clinical notes from the electronic health record can enable the training of clinical language models, which can be used as all-purpose clinical predictive engines with low-resistance development and deployment. Our approach leverages recent advances in natural language processing4,5 to train a large language model for medical language (NYUTron) and subsequently fine-tune it across a wide range of clinical and operational predictive tasks. We evaluated our approach within our health system for five such tasks: 30-day all-cause readmission prediction, in-hospital mortality prediction, comorbidity index prediction, length of stay prediction, and insurance denial prediction. We show that NYUTron has an area under the curve (AUC) of 78.7–94.9%, with an improvement of 5.36–14.7% in the AUC compared with traditional models. We additionally demonstrate the benefits of pretraining with clinical text, the potential for increasing generalizability to different sites through fine-tuning and the full deployment of our system in a prospective, single-arm trial. These results show the potential for using clinical language models in medicine to read alongside physicians and provide guidance at the point of care.
Background: Tumor programmed death-ligand 1 (PD-L1) status is useful in determining which patients may benefit from programmed death-1 (PD-1)/PD-L1 inhibitors. However, little is known about the association between PD-L1 status and tumor histopathological patterns. Using deep learning, we predicted PD-L1 status from hematoxylin and eosin (H and E) whole-slide images (WSIs) of nonsmall cell lung cancer (NSCLC) tumor samples. Materials and Methods: One hundred and thirty NSCLC patients were randomly assigned to training ( n = 48) or test ( n = 82) cohorts. A pair of H and E and PD-L1-immunostained WSIs was obtained for each patient. A pathologist annotated PD-L1 positive and negative tumor regions on the training samples using immunostained WSIs for reference. From the H and E WSIs, over 145,000 training tiles were generated and used to train a multi-field-of-view deep learning model with a residual neural network backbone. Results: The trained model accurately predicted tumor PD-L1 status on the held-out test cohort of H and E WSIs, which was balanced for PD-L1 status (area under the receiver operating characteristic curve [AUC] =0.80, P << 0.01). The model remained effective over a range of PD-L1 cutoff thresholds (AUC = 0.67–0.81, P ≤ 0.01) and when different proportions of the labels were randomly shuffled to simulate interpathologist disagreement (AUC = 0.63–0.77, P ≤ 0.03). Conclusions: A robust deep learning model was developed to predict tumor PD-L1 status from H and E WSIs in NSCLC. These results suggest that PD-L1 expression is correlated with the morphological features of the tumor microenvironment.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.