For complex machine learning (ML) algorithms to gain widespread acceptance in decision making, we must be able to identify the features driving the predictions. Explainability models allow transparency of ML algorithms, however their reliability within high-dimensional data is unclear. To test the reliability of the explainability model SHapley Additive exPlanations (SHAP), we developed a convolutional neural network to predict tissue classification from Genotype-Tissue Expression (GTEx) RNA-seq data representing 16,651 samples from 47 tissues. Our classifier achieved an average F1 score of 96.1% on held-out GTEx samples. Using SHAP values, we identified the 2423 most discriminatory genes, of which 98.6% were also identified by differential expression analysis across all tissues. The SHAP genes reflected expected biological processes involved in tissue differentiation and function. Moreover, SHAP genes clustered tissue types with superior performance when compared to all genes, genes detected by differential expression analysis, or random genes. We demonstrate the utility and reliability of SHAP to explain a deep learning model and highlight the strengths of applying ML to transcriptome data.
Background Acute kidney injury (AKI) is one of the most common and significant problems in patients with Coronavirus Disease 2019 (COVID-19). However, little is known about the incidence and impact of AKI occurring in the community or early in the hospital admission. The traditional Kidney Disease Improving Global Outcomes (KDIGO) definition can fail to identify patients for whom hospitalisation coincides with recovery of AKI as manifested by a decrease in serum creatinine (sCr). We hypothesised that an extended KDIGO (eKDIGO) definition, adapted from the International Society of Nephrology (ISN) 0by25 studies, would identify more cases of AKI in patients with COVID-19 and that these may correspond to community-acquired AKI (CA-AKI) with similarly poor outcomes as previously reported in this population. Methods and findings All individuals recruited using the International Severe Acute Respiratory and Emerging Infection Consortium (ISARIC)–World Health Organization (WHO) Clinical Characterisation Protocol (CCP) and admitted to 1,609 hospitals in 54 countries with Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) infection from February 15, 2020 to February 1, 2021 were included in the study. Data were collected and analysed for the duration of a patient’s admission. Incidence, staging, and timing of AKI were evaluated using a traditional and eKDIGO definition, which incorporated a commensurate decrease in sCr. Patients within eKDIGO diagnosed with AKI by a decrease in sCr were labelled as deKDIGO. Clinical characteristics and outcomes—intensive care unit (ICU) admission, invasive mechanical ventilation, and in-hospital death—were compared for all 3 groups of patients. The relationship between eKDIGO AKI and in-hospital death was assessed using survival curves and logistic regression, adjusting for disease severity and AKI susceptibility. A total of 75,670 patients were included in the final analysis cohort. Median length of admission was 12 days (interquartile range [IQR] 7, 20). There were twice as many patients with AKI identified by eKDIGO than KDIGO (31.7% versus 16.8%). Those in the eKDIGO group had a greater proportion of stage 1 AKI (58% versus 36% in KDIGO patients). Peak AKI occurred early in the admission more frequently among eKDIGO than KDIGO patients. Compared to those without AKI, patients in the eKDIGO group had worse renal function on admission, more in-hospital complications, higher rates of ICU admission (54% versus 23%) invasive ventilation (45% versus 15%), and increased mortality (38% versus 19%). Patients in the eKDIGO group had a higher risk of in-hospital death than those without AKI (adjusted odds ratio: 1.78, 95% confidence interval: 1.71 to 1.80, p-value < 0.001). Mortality and rate of ICU admission were lower among deKDIGO than KDIGO patients (25% versus 50% death and 35% versus 70% ICU admission) but significantly higher when compared to patients with no AKI (25% versus 19% death and 35% versus 23% ICU admission) (all p-values <5 × 10−5). Limitations include ad hoc sCr sampling, exclusion of patients with less than 2 sCr measurements, and limited availability of sCr measurements prior to initiation of acute dialysis. Conclusions An extended KDIGO definition of AKI resulted in a significantly higher detection rate in this population. These additional cases of AKI occurred early in the hospital admission and were associated with worse outcomes compared to patients without AKI.
Spatial transcriptomic (ST) data enables us to link tissue morphological features with thousands of unseen gene expression values, opening a horizon for breakthroughs in digital pathology. Models to predict the presence/absence, high/low, or continuous expression of a gene using images as the only input have a huge potential clinical applications, but such models require improvements in accuracy, interpretability, and robustness. We developed STimage models to estimate parameters of gene expression as distributions rather than fixed data points, thereby allowing for the essential quantification of uncertainty in the predicted results. We assessed aleatoric and epistemic uncertainty of the models across a diverse range of test cases and proposed an ensemble approach to improve the model performance and trust. STimage can train prediction models for one gene marker or a panel of markers and provides important interpretability analyses at a single- cell level, and in the histopathological annotation context. Through a comprehensive benchmarking with existing models, we found that STimage is more robust to technical variation in platforms, data types, and sample types. Using images from the cancer genome atlas, we showed that STimage can be applied to non-spatial omics data. STimage also performs better than other models when only a small training dataset is available. Overall, STimage contributes an important methodological advance needed for the potential application of spatial technology in cancer digital pathology.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.