Background: General severity of illness scores are not well calibrated to predict mortality among patients receiving renal replacement therapy (RRT) for acute kidney injury (AKI). We developed machine learning models to make mortality prediction and compared their performance to that of the Sequential Organ Failure Assessment (SOFA) and HEpatic failure, LactatE, NorepInephrine, medical Condition, and Creatinine (HELENICC) scores. Methods: We extracted routinely collected clinical data for AKI patients requiring RRT in the MIMIC and eICU databases. The development models were trained in 80% of the pooled dataset and tested in the rest of the pooled dataset. We compared the area under the receiver operating characteristic curves (AUCs) of four machine learning models (multilayer perceptron [MLP], logistic regression, XGBoost, and random forest [RF]) to that of the SOFA, nonrenal SOFA, and HELENICC scores and assessed calibration, sensitivity, specificity, positive (PPV) and negative (NPV) predicted values, and accuracy. Results: The mortality AUC of machine learning models was highest for XGBoost (0.823; 95% confidence interval [CI], 0.791–0.854) in the testing dataset, and it had the highest accuracy (0.758). The XGBoost model showed no evidence of lack of fit with the Hosmer–Lemeshow test (p > 0.05). Conclusion: XGBoost provided the highest performance of mortality prediction for patients with AKI requiring RRT compared with previous scoring systems.
Developing a biomedical-explainable and validatable text mining pipeline can help in cancer gene panel discovery. We create a pipeline that can contextualize genes by using text-mined co-occurrence features. We apply Biomedical Natural Language Processing (BioNLP) techniques for literature mining in the cancer gene panel. A literature-derived 4,679 × 4,630 gene term-feature matrix was built. The EGFR L858R and T790M, and BRAF V600E genetic variants are important mutation term features in text mining and are frequently mutated in cancer. We validate the cancer gene panel by the mutational landscape of different cancer types. The cosine similarity of gene frequency between text mining and a statistical result from clinical sequencing data is 80.8%. In different machine learning models, the best accuracy for the prediction of two different gene panels, including MSK-IMPACT (Memorial Sloan Kettering-Integrated Mutation Profiling of Actionable Cancer Targets), and Oncomine cancer gene panel, is 0.959, and 0.989, respectively. The receiver operating characteristic (ROC) curve analysis confirmed that the neural net model has a better prediction performance (Area under the ROC curve (AUC) = 0.992). The use of text-mined co-occurrence features can contextualize each gene. We believe the approach is to evaluate several existing gene panels, and show that we can use part of the gene panel set to predict the remaining genes for cancer discovery.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.