e19318 Background: ECOG PS is a prognostic indicator of outcomes, and scores of 0-1 (good ECOG PS) are often required for clinical trial enrollment. Patients treated in non-trial settings often lack ECOG PS scores limiting the ability of Real World Data from these patients to be used in external control arms (ECAs) or to provide optimal specificity for clinical effectiveness research. Machine Learning can be used to impute ECOG PS scores from other clinical data at various points during treatment. Methods: We developed a series of models using logistic regression (LR) or XGBoost (XGB) that impute ECOG PS at initial diagnosis, metastatic diagnosis and final evaluation using a curated Non-Small Cell Lung Cancer cohort of 31,425 patients with at least one ECOG PS score. Results: AUC-ROC values of up to 0.81 could be obtained for imputing a patient’s final ECOG PS, with lower AUC values when imputing ECOG PS at initial and metastatic diagnosis using large numbers (i.e. thousands) of features. We developed more interpretable models with 110 or 40 features with reduced but still satisfactory AUC, with accuracy of predicting good ECOG PS scores of around 80%. Key features were obtained from lab tests, physical exams, comorbidities, medications, age and metastatic status. The table below shows the results of several of these models. Where the models misclassify ECOG PS, the error was rarely greater than 1 grade. Conclusions: ECOG PS is subjective, suggesting that ML based cohort assignment will be sufficiently accurate to support their use in research. Further work will be required to assess if the ML predicted cohorts have different outcomes. [Table: see text]
6556 Background: Survival prediction models for lung cancer patients could help guide their care and therapy decisions. The objectives of this study were to predict probability of survival beyond 90, 180 and 360 days from any point in a lung cancer patient’s journey. Methods: We developed a Gradient Boosting model (XGBoost) using data from 55k lung cancer patients in the ASCO CancerLinQ database that used 3958 unique variables including Dx and Rx codes, biomarkers, surgeries and lab tests from ≤1 year prior to the prediction point, which was chosen at random for each patient. We used 40% data for training, 25% for hyper-parameter tuning, 20% for testing and 15% for holdout validation. Death date available in the Electronic Health Record was cross checked by linkage to death registries. Results: The model was validated on the holdout set of 8,468 patients. The Area Under the Curve (AUC) for the model was 0.79. The precision and recall for predicting survival beyond the three time points were between 0.7-0.8 and 0.8-0.9 respectively (see table). This compares favourably to other lung cancer survival models created using different machine learning techniques (Jochems 2017, Dekker 2009). A Cox-PH model created using the top 20 variables also had a significantly lower performance (see table). Analysis of input variables yielded distinctive patterns for patient subgroups and time points. Tumor status, medications, lab values and functional status were found to be significant in patient sub cohorts. Conclusions: An AI model to predict survival of lung cancer patients built using a large real world dataset yielded high accuracy. This general model can further be used to predict survival of sub cohorts stratified by variables such as stage or various treatment effects. Such a model could be useful for assessing patient risk and treatment options, evaluating cost and quality of care or determining clinical trial eligibility. [Table: see text]
impact of varying adherence rates on the relative benefits of FIT and mt-sDNA screening. Methods: Sensitivity and specificity from DeeP-C trial data were used for screening inputs. Predicted outcomes of annual FIT and triennial mt-sDNA were simulated for individuals born in 1975 who were free of diagnosed CRC at age 40 and screened between ages 50-75. Adherence was set by assuming a fixed annual likelihood to comply ranging from 0-100%, in 10% increments. It was assumed that patients were offered a stool-based screening test yearly unless they were not due for screening. Predicted outcomes are per 1000 individuals versus no screening. Results: Each screening strategy yielded higher life-years gained (LYG) versus no screening. At perfect adherence, mt-sDNA resulted in 4.1% fewer LYG (LYG=302.2; colonoscopies=1856) versus FIT (LYG=315.2; colonoscopies =1915). At imperfect adherence rates of 70% for triennial mt-sDNA and 40% for annual FIT, mt-sDNA resulted in a 19.1% increase in LYG (288.9; colonoscopies=1724) versus FIT (242.5; colonoscopies=1218). LYG for FIT was more sensitive to per-unit change in adherence rates ([315.22101.2]/ [100%210%]=2.4 LYG/unit change) than mt-sDNA (1.8 LYG/unit change). At equivalent adherence, mt-sDNA generally resulted in higher colonoscopies and lower stool testing vs FIT. Conclusions: Stool-based CRC screening provides higher LYG vs no screening, regardless of adherence assumptions. The comparative effectiveness of FIT versus mt-sDNA screening changes dramatically when assuming adherence is ,100%, with mt-sDNA outperforming FIT under adherence assumptions that are more consistent with available, although currently incomplete, real-world evidence.
to make reproductive and healthcare decisions. Screening for breast/ovarian cancer in older women may offer lower value in isolation, but its cost-effectiveness should be assessed within the context of a broader screening panel for other diseases.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.