Tree-based machine learning models such as random forests, decision trees, and gradient boosted trees are popular non-linear predictive models, yet comparatively little attention has been paid to explaining their predictions. Here, we improve the interpretability of tree-based models through three main contributions: 1) The first polynomial time algorithm to compute optimal explanations based on game theory. 2) A new type of explanation that directly measures local feature interaction effects. 3) A new set of tools for understanding global model structure based on combining many local explanations of each prediction. We apply these tools to three medical machine learning problems and show how combining many high-quality local explanations allows us to represent global structure while retaining local faithfulness to the original model. These tools enable us to i) identify high magnitude but low frequency non-linear mortality risk factors in the US population, ii) highlight distinct population subgroups with shared risk characteristics, iii) identify non-linear interaction effects among risk factors for chronic kidney disease, and iv) monitor a machine learning model deployed in a hospital by identifying which features are degrading the model's performance over time. Given the popularity of tree-based machine learning models, these improvements to their interpretability have implications across a broad set of domains.
Although anaesthesiologists strive to avoid hypoxemia during surgery, reliably predicting future intraoperative hypoxemia is not currently possible. Here, we report the development and testing of a machine-learning-based system that, in real time during general anaesthesia, predicts the risk of hypoxemia and provides explanations of the risk factors. The system, which was trained on minute-by-minute data from the electronic medical records of over fifty thousand surgeries, improved the performance of anaesthesiologists when providing interpretable hypoxemia risks and contributing factors. The explanations for the predictions are broadly consistent with the literature and with prior knowledge from anaesthesiologists. Our results suggest that if anaesthesiologists currently anticipate 15% of hypoxemia events, with this system’s assistance they would anticipate 30% of them, a large portion of which may benefit from early intervention because they are associated with modifiable factors. The system can help improve the clinical understanding of hypoxemia risk during anaesthesia care by providing general insights into the exact changes in risk induced by certain patient or procedure characteristics.
Background-Accurately estimating operative case-time duration is critical for optimizing operating room utilization. Current estimates are inaccurate and prior models include data not available at the time of scheduling. Our objective was to develop statistical models in a large retrospective dataset to improve estimation of case-time duration relative to current standards. Study Design-We developed models to predict case-time duration using linear regression and supervised machine learning (ML). For each of these models, we generated: 1) service-specific models and 2) surgeon-specific models in which surgeons were modeled individually. Our dataset included 46,986 scheduled surgeries performed at our center from January 2014 to December 2017, with 80% used for training and 20% for model testing/validation. Predictions derived from each model were compared to our institutional standard. Models were evaluated based on accuracy, overage (case duration > predicted + 10%), underage (case duration < predicted-10%), and the predictive capability of being within a 10% tolerance threshold. Results-The ML algorithm resulted in the highest predictive capability. The surgeon-specific model was superior to the service-specific model, with higher accuracies, lower percentage of overage and underage, and higher percentage of cases within the 10% threshold. The ability to predict cases within 10% improved from 32% using our institutional standard to 39% with the ML surgeon-specific model. The majority of the information utilized in the models was based on procedure and personnel data rather than patient health status.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.