The Comprehensive Resistance Prediction for Tuberculosis: an International Consortium (CRyPTIC) presents here a data compendium of 12,289 Mycobacterium tuberculosis global clinical isolates, all of which have undergone whole-genome sequencing and have had their minimum inhibitory concentrations to 13 antitubercular drugs measured in a single assay. It is the largest matched phenotypic and genotypic dataset for M. tuberculosis to date. Here, we provide a summary detailing the breadth of data collected, along with a description of how the isolates were selected, collected, and uniformly processed in CRyPTIC partner laboratories across 23 countries. The compendium contains 6,814 isolates resistant to at least 1 drug, including 2,129 samples that fully satisfy the clinical definitions of rifampicin resistant (RR), multidrug resistant (MDR), pre-extensively drug resistant (pre-XDR), or extensively drug resistant (XDR). The data are enriched for rare resistance-associated variants, and the current limits of genotypic prediction of resistance status (sensitive/resistant) are presented by using a genetic mutation catalogue, along with the presence of suspected resistance-conferring mutations for isolates resistant to the newly introduced drugs bedaquiline, clofazimine, delamanid, and linezolid. Finally, a case study of rifampicin monoresistance demonstrates how this compendium could be used to advance our genetic understanding of rare resistance phenotypes. The data compendium is fully open source and it is hoped that it will facilitate and inspire future research for years to come.
is Changing This March, Public Health England will officially launch a routine whole-genome sequencing service for mycobacterial infections, becoming the first national administration in the world to do so. The implications are profound with many of the methods that have become commonplace around the world being replaced. From now on, when mycobacterial cultures are referred to reference facilities, comprehensive whole-genome sequencing-based reports will be returned within 5-7 days. Reports will detail mycobacterial species, and predict drug susceptibility and genetic relatedness for the Mycobacterium tuberculosis complex. The
Motivation Resistance co-occurrence within first-line anti-tuberculosis (TB) drugs is a common phenomenon. Existing methods based on genetic data analysis of Mycobacterium tuberculosis (MTB) have been able to predict resistance of MTB to individual drugs, but have not considered the resistance co-occurrence and cannot capture latent structure of genomic data that corresponds to lineages. Results We used a large cohort of TB patients from 16 countries across six continents where whole-genome sequences for each isolate and associated phenotype to anti-TB drugs were obtained using drug susceptibility testing recommended by the World Health Organization. We then proposed an end-to-end multi-task model with deep denoising auto-encoder (DeepAMR) for multiple drug classification and developed DeepAMR_cluster, a clustering variant based on DeepAMR, for learning clusters in latent space of the data. The results showed that DeepAMR outperformed baseline model and four machine learning models with mean AUROC from 94.4% to 98.7% for predicting resistance to four first-line drugs [i.e. isoniazid (INH), ethambutol (EMB), rifampicin (RIF), pyrazinamide (PZA)], multi-drug resistant TB (MDR-TB) and pan-susceptible TB (PANS-TB: MTB that is susceptible to all four first-line anti-TB drugs). In the case of INH, EMB, PZA and MDR-TB, DeepAMR achieved its best mean sensitivity of 94.3%, 91.5%, 87.3% and 96.3%, respectively. While in the case of RIF and PANS-TB, it generated 94.2% and 92.2% sensitivity, which were lower than baseline model by 0.7% and 1.9%, respectively. t-SNE visualization shows that DeepAMR_cluster captures lineage-related clusters in the latent space. Availability and implementation The details of source code are provided at http://www.robots.ox.ac.uk/∼davidc/code.php. Supplementary information Supplementary data are available at Bioinformatics online.
The dN/dS ratio provides evidence of adaptation or functional constraint in protein-coding genes by quantifying the relative excess or deficit of amino acid-replacing versus silent nucleotide variation. Inexpensive sequencing promises a better understanding of parameters, such as dN/dS, but analyzing very large data sets poses a major statistical challenge. Here, I introduce genomegaMap for estimating within-species genome-wide variation in dN/dS, and I apply it to 3,979 genes across 10,209 tuberculosis genomes to characterize the selection pressures shaping this global pathogen. GenomegaMap is a phylogeny-free method that addresses two major problems with existing approaches: 1) It is fast no matter how large the sample size and 2) it is robust to recombination, which causes phylogenetic methods to report artefactual signals of adaptation. GenomegaMap uses population genetics theory to approximate the distribution of allele frequencies under general, parent-dependent mutation models. Coalescent simulations show that substitution parameters are well estimated even when genomegaMap’s simplifying assumption of independence among sites is violated. I demonstrate the ability of genomegaMap to detect genuine signatures of selection at antimicrobial resistance-conferring substitutions in Mycobacterium tuberculosis and describe a novel signature of selection in the cold-shock DEAD-box protein A gene deaD/csdA. The genomegaMap approach helps accelerate the exploitation of big data for gaining new insights into evolution within species.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.