Background The diagnosis of multidrug resistant and extensively drug resistant tuberculosis is a global health priority. Whole genome sequencing of clinical Mycobacterium tuberculosis isolates promises to circumvent the long wait times and limited scope of conventional phenotypic antimicrobial susceptibility, but gaps remain for predicting phenotype accurately from genotypic data especially for certain drugs. Our primary aim was to perform an exploration of statistical learning algorithms and genetic predictor sets using a rich dataset to build a high performing and fast predicting model to detect anti-tuberculosis drug resistance. Methods We collected targeted or whole genome sequencing and conventional drug resistance phenotyping data from 3601 Mycobacterium tuberculosis strains enriched for resistance to first- and second-line drugs, with 1228 multidrug resistant strains. We investigated the utility of (1) rare variants and variants known to be determinants of resistance for at least one drug and (2) machine and statistical learning architectures in predicting phenotypic drug resistance to 10 anti-tuberculosis drugs. Specifically, we investigated multitask and single task wide and deep neural networks, a multilayer perceptron, regularized logistic regression, and random forest classifiers. Findings The highest performing machine and statistical learning methods included both rare variants and those known to be causal of resistance for at least one drug. Both simpler L2 penalized regression and complex machine learning models had high predictive performance. The average AUCs for our highest performing model was 0.979 for first-line drugs and 0.936 for second-line drugs during repeated cross-validation. On an independent validation set, the highest performing model showed average AUCs, sensitivities, and specificities, respectively, of 0.937, 87.9%, and 92.7% for first-line drugs and 0.891, 82.0% and 90.1% for second-line drugs. Our method outperforms existing approaches based on direct association, with increased sum of sensitivity and specificity of 11.7% on first line drugs and 3.2% on second line drugs. Our method has higher predictive performance compared to previously reported machine learning models during cross-validation, with higher AUCs for 8 of 10 drugs. Interpretation Statistical models, especially those that are trained using both frequent and less frequent variants, significantly improve the accuracy of resistance prediction and hold promise in bringing sequencing technologies closer to the bedside.
One sentence summary: A unified multitask deep learning model can be used to identify multidrug resistant Mycobacterium tuberculosis using sequencing data. AbstractThe diagnosis of multidrug resistant and extensively drug resistant tuberculosis is a global health priority. Whole genome sequencing of clinical Mycobacterium tuberculosis isolates promises to circumvent the long wait times and limited scope of conventional phenotypic drug susceptibility but gaps remain for predicting phenotype accurately from genotypic data. Using targeted or whole genome sequencing and conventional drug resistance phenotyping data from 3,601 Mycobacterium tuberculosis strains, 1,228 of which were multidrug resistant, we implemented the first multitask deep learning framework to predict phenotypic drug resistance to 10 anti-tubercular drugs. The proposed wide and deep neural network (WDNN) achieved improved predictive performance compared to regularized logistic regression and random forest: the average sensitivities and specificities, respectively, were 92.7% and 92.7% for first-line drugs and 82.0% and 92.8% for second-line drugs during cross-validation. On an independent validation set, the multitask WDNN showed significant performance gains over baseline models, with average sensitivities and specificities, respectively, of 84.5% and 93.6% for first-line drugs and 64.0% and 95.7% for second-line drugs. In addition to being able to learn from samples that have only been partially phenotyped, our proposed multitask architecture shares information across different anti-tubercular drugs and genes to provide a more accurate phenotypic prediction. We use t-distributed Stochastic Neighbor Embedding (t-SNE) visualization and feature importance analyses to examine inter-drug similarities. Deep learning has a clear role in improving drug resistance predictive performance over traditional methods and holds promise in bringing sequencing technologies closer to the bedside.. CC-BY-NC-ND 4.0 International license It is made available under a was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. Diagnosing drug resistance remains a barrier to providing appropriate TB treatment. Due to insufficient resources for building diagnostic laboratories, fewer than half of the countries with a high MDR-TB burden have modern diagnostic capabilities (3). Even in the best equipped laboratories, conventional culture and culture based drug susceptibility testing (DST) constitutes a considerable biohazard and requires weeks to months before results are reported due to Mycobacterium tuberculosis's slow growth in vitro (1). Molecular diagnostics are now an increasingly common alternative to conventional cultures. The WHO has endorsed three such molecular tests: the GeneXpert MTB/RIF a rapid RT-PCR based diagnostic test assay that detects RIF resistance, the Hain line probe assay (LPA) that tests for both RIF and INH resistance, and the Hain MDRTBsl an LPA that tests for resistance to second-line in...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.