An explainable machine learning framework for lung cancer hospital length of stay prediction

Alsinglawi, Belal; Alshari, Osama; Alorjani, Mohammed; Mubin, Omar; Alnajjar, Fady; Novoa, Mauricio; Darwish, Omar

doi:10.1038/s41598-021-04608-7

Cited by 91 publications

(40 citation statements)

References 37 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…It ended up with an R 2 score of 0.729. Alsinglawi et al [ 14 ] constructed a LOS prediction framework for lung cancer patients using RF and oversampling techniques (SMOTE and ADASYN). The framework gets an AUC score of 100% on the MIMIC-III dataset.…”

Section: Related Workmentioning

confidence: 99%

“…c 1 and c 2 are the mean values of the target feature corresponding to R 1 (A i , S) and R 2 (A i , S), respectively (13). e next step of the algorithm is to find which S can make the MSE of the feature minimum (14) and then use the segmentation point S together with the feature as the node of the tree. After the algorithm divides all features, the CART regression tree uses the average of all leaf nodes as the output ( 15) [42].…”

Section: Ridge Regressionmentioning

confidence: 99%

See 1 more Smart Citation

Length of Stay Prediction Model of Indoor Patients Based on Light Gradient Boosting Machine

Zeng

2022

Computational Intelligence and Neuroscience

View full text Add to dashboard Cite

The influx of hospital patients has become common in recent years. Hospital management departments need to redeploy healthcare resources to meet the massive medical needs of patients. In this process, the hospital length of stay (LOS) of different patients is a crucial reference to the management department. Therefore, building a model to predict LOS is of great significance. Five machine learning (ML) algorithms named Lasso regression (LR), ridge regression (RR), random forest regression (RFR), light gradient boosting machine (LightGBM), and extreme gradient boosting regression (XGBR) and six feature encoding methods named label encoding, count encoding, one-hot encoding, target encoding, leave-one-out encoding, and the proposed encoding method are used to construct the regression prediction model. The Scikit-Learn toolbox on the Python platform builds the prediction model. The input is the dataset named Hospital Inpatient Discharges (SPARCS De-Identified) 2017 with 2343569 instances provided by the New York State Department of Health verify the model after removing 2.2% of the missing data, and the model ultimately uses mean squared error (MSE) and coefficient of determination (R2) as the performance measurement. The results show that the model with the LightGBM algorithm and the proposed encoding method has the best R2 (96.0%) and MSE score (2.231).

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Ridge Regressionmentioning

confidence: 99%

Length of Stay Prediction Model of Indoor Patients Based on Light Gradient Boosting Machine

Zeng

2022

Computational Intelligence and Neuroscience

View full text Add to dashboard Cite

show abstract

“…Although many studies have used EHR data, most of them have only used quantitative EHR data [ 8 , 9 , 10 , 11 ]. In fact, 80% of EHR data comprises semi-structured data such as patients’ physiological conditions (free-text notes and clinician progress notes) at the time of their visits [ 12 ].…”

Section: Introductionmentioning

confidence: 99%

Predicting the Mortality of ICU Patients by Topic Model with Machine-Learning Techniques

Chiu

Chien

et al. 2022

Healthcare

View full text Add to dashboard Cite

Predicting clinical patients’ vital signs is a leading critical issue in intensive care units (ICUs) related studies. Early prediction of the mortality of ICU patients can reduce the overall mortality and cost of complication treatment. Some studies have predicted mortality based on electronic health record (EHR) data by using machine learning models. However, the semi-structured data (i.e., patients’ diagnosis data and inspection reports) is rarely used in these models. This study utilized data from the Medical Information Mart for Intensive Care III. We used a Latent Dirichlet Allocation (LDA) model to classify text in the semi-structured data of some particular topics and established and compared the classification and regression trees (CART), logistic regression (LR), multivariate adaptive regression splines (MARS), random forest (RF), and gradient boosting (GB). A total of 46,520 ICU Patients were included, with 11.5% mortality in the Medical Information Mart for Intensive Care III group. Our results revealed that the semi-structured data (diagnosis data and inspection reports) of ICU patients contain useful information that can assist clinical doctors in making critical clinical decisions. In addition, in our comparison of five machine learning models (CART, LR, MARS, RF, and GB), the GB model showed the best performance with the highest area under the receiver operating characteristic curve (AUROC) (0.9280), specificity (93.16%), and sensitivity (83.25%). The RF, LR, and MARS models showed better performance (AUROC are 0.9096, 0.8987, and 0.8935, respectively) than the CART (0.8511). The GB model showed better performance than other machine learning models (CART, LR, MARS, and RF) in predicting the mortality of patients in the intensive care unit. The analysis results could be used to develop a clinically useful decision support system.

show abstract

“…We hypothesize that integrating H&E image data with other data modalities can improve risk stratification since clinical variables, mutation status, and gene expression profiles have individually been shown to be informative 23 . To address this question, we develop and evaluate integrative deep learning models that combine morphological features from H&E WSIs, clinical variables, MSI-status, and mutation status of key genes [24][25][26][27][28][29][30][31] .…”

Section: Introductionmentioning

confidence: 99%

Integrative deep learning analysis improves colon adenocarcinoma patient stratification at risk for mortality

Zhou

pour

Deirawan

et al. 2022

Preprint

View full text Add to dashboard Cite

Colorectal cancers are the fourth most commonly diagnosed cancer and the second leading cancer in number of deaths. Many clinical variables, pathological features, and genomic signatures are associated with patient risk, but reliable patient stratification in the clinic remains a challenging task. Here we assess how image, clinical, and genomic features can be combined to predict risk. We first observe that deep learning models based only on whole slide images (WSIs) from The Cancer Genome Atlas accurately separate high risk (OS<3years, N=38) from low risk (OS>5years, N=25) patients (AUC=0.81±0.08, 5year survival p-value=2.13e-25, 5year relative risk=5.09±0.05) though such models are less effective at predicting OS for moderate risk (3years<OS<5years, N=45) patients (5year survival p-value=0.5, 5year relative risk=1.32±0.09). However, we find that novel integrative models combining whole slide images, clinical variables, and mutation signatures can improve patient stratification for moderate risk patients (5year survival p-value=6.69e-30, 5year relative risk=5.32±0.07). Our integrative model combining image and clinical variables is also effective on an independent pathology dataset generated by our team (3year survival p-value=1.14e-09, 5year survival p-value=2.15e-05, 3year relative risk=3.25±0.06, 5year relative-risk=3.07±0.08). The integrative model substantially outperforms models using only images or only clinical variables, indicating beneficial cross-talk between the data types. Pathologist review of image-based heatmaps suggests that nuclear shape, nuclear size pleomorphism, intense cellularity, and abnormal structures are associated with high risk, while low risk regions tend to have more regular and small cells. The improved stratification of colorectal cancer patients from our computational methods can be beneficial for preemptive development of management and treatment plans for individual patients, as well as for informed enrollment of patients in clinical trials.

show abstract

An explainable machine learning framework for lung cancer hospital length of stay prediction

Cited by 91 publications

References 37 publications

Length of Stay Prediction Model of Indoor Patients Based on Light Gradient Boosting Machine

Length of Stay Prediction Model of Indoor Patients Based on Light Gradient Boosting Machine

Predicting the Mortality of ICU Patients by Topic Model with Machine-Learning Techniques

Integrative deep learning analysis improves colon adenocarcinoma patient stratification at risk for mortality

Contact Info

Product

Resources

About