Improved identification of bacterial and viral infections would reduce morbidity from sepsis, reduce antibiotic overuse, and lower healthcare costs. Here, we develop a generalizable hostgene-expression-based classifier for acute bacterial and viral infections. We use training data (N = 1069) from 18 retrospective transcriptomic studies. Using only 29 preselected host mRNAs, we train a neural-network classifier with a bacterial-vs-other area under the receiver-operating characteristic curve (AUROC) 0.92 (95% CI 0.90-0.93) and a viral-vsother AUROC 0.92 (95% CI 0.90-0.93). We then apply this classifier, inflammatix-bacterialviral-noninfected-version 1 (IMX-BVN-1), without retraining, to an independent cohort (N = 163). In this cohort, IMX-BVN-1 AUROCs are: bacterial-vs.-other 0.86 (95% CI 0.77-0.93), and viral-vs.-other 0.85 (95% CI 0.76-0.93). In patients enrolled within 36 h of hospital admission (N = 70), IMX-BVN-1 AUROCs are: bacterial-vs.-other 0.92 (95% CI 0.83-0.99), and viral-vs.-other 0.91 (95% CI 0.82-0.98). With further study, IMX-BVN-1 could provide a tool for assessing patients with suspected infection and sepsis at hospital admission.
Host-response gene expression measurements may carry confounding associations with patient demographic characteristics that can induce bias in downstream classifiers. Assessment of deployed machine learning systems in other domains has revealed the presence of such biases and exposed the potential of these systems to cause harm. Such an assessment of a gene-expression-based classifier has not been carried out and collation of requisite patient subgroup data has not been undertaken. Here, we present data resources and an auditing framework for patient subgroup analysis of diagnostic classifiers of acute infection. Our dataset comprises demographic characteristics of nearly 6500 patients across 49 studies. We leverage these data to detect differences across patient subgroups in terms of gene-expression-based host response and performance with both our candidate pre-market diagnostic classifier and a standard-of-care biomarker of acute infection. We find evidence of variable representation with respect to patient covariates in our multi-cohort datasets as well as differences in host-response marker expression across patient subgroups. We also detect differences in performance of multiple host-response-based diagnostics for acute infection. This analysis marks an important first step in our ongoing efforts to characterize and mitigate potential bias in machine learning-based host-response diagnostics, highlighting the importance of accounting for such bias in developing diagnostic tests that generalize well across diverse patient populations.
Acute infection, if not rapidly and accurately detected, can lead to sepsis, organ failure and even death. Currently, detection of acute infection as well as assessment of a patient's severity of illness are based on imperfect (and often superficial) measures of patient physiology. Characterization of a patient's immune response by quantifying expression levels of key genes from blood represents a potentially more timely and precise means of detecting acute infection and severe illness. Machine learning methods provide a platform for development of deployment-ready classification models robust to the smaller, more heterogeneous datasets typical of healthcare. Identification of promising classifiers is dependent, in part, on hyperparameter optimization (HO), for which a number of approaches including grid search, random sampling and Bayesian optimization have been shown to be effective. In this analysis, we compare HO approaches for the development of diagnostic classifiers of acute infection and in-hospital mortality from gene expression of 29 diagnostic markers. Our comprehensive analysis of a multi-study patient cohort evaluates HO for three different
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.