om as , F e r na n d o G on zález Candelas, SeqCOVID-SPAIN consortium, Tanja Stadler & Richard A. NeherThis is a PDF file of a peer-reviewed paper that has been accepted for publication. Although unedited, the content has been subjected to preliminary formatting. Nature is providing this early version of the typeset paper as a service to our authors and readers. The text and figures will undergo copyediting and a proof review before the paper is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers apply.
Whole genome sequencing (WGS) of Mycobacterium tuberculosis has rapidly evolved from a research tool to a clinical application for the diagnosis and management of tuberculosis and in public health surveillance. This evolution has been facilitated by the dramatic drop in costs, advances in technology, and concerted efforts to translate sequencing data into actionable information. There is however a risk that, in the absence of a consensus and international standards, the widespread use of WGS technology may result in data and processes that lack harmonisation, comparability and validation. In this review, we outline the current landscape of WGS pipelines and applications and set out best practices for M. tuberculosis WGS, including standards for bioinformatics pipelines, curated repository of resistance-causing variants, phylogenetic analyses, quality control processes, and standardised reporting. 1. Introduction Mycobacterium tuberculosis complex (Mtbc) pathogens are collectively the top infectious disease killer globally, causing 10 million new tuberculosis (TB) cases annually 1. Increasingly, 95 new TB cases are already resistant to rifampicin and isoniazid (termed multidrug resistance; 96 MDR-TB), the key first line drugs 1. Tackling the spread and drug resistance burden of this pathogen requires concerted global effort in prevention, diagnosis, treatment and surveillance.
SummaryColonial medical reports claimed that tuberculosis (TB) was largely unknown in Africa prior to European contact, providing a “virgin soil” for spread of TB in highly susceptible populations previously unexposed to the disease [1, 2]. This is in direct contrast to recent phylogenetic models which support an African origin for TB [3, 4, 5, 6]. To address this apparent contradiction, we performed a broad genomic sampling of Mycobacterium tuberculosis in Ethiopia. All members of the M. tuberculosis complex (MTBC) arose from clonal expansion of a single common ancestor [7] with a proposed origin in East Africa [3, 4, 8]. Consistent with this proposal, MTBC lineage 7 is almost exclusively found in that region [9, 10, 11]. Although a detailed medical history of Ethiopia supports the view that TB was rare until the 20th century [12], over the last century Ethiopia has become a high-burden TB country [13]. Our results provide further support for an African origin for TB, with some genotypes already present on the continent well before European contact. Phylogenetic analyses reveal a pattern of serial introductions of multiple genotypes into Ethiopia in association with human migration and trade. In place of a “virgin soil” fostering the spread of TB in a previously naive population, we propose that increased TB mortality in Africa was driven by the introduction of European strains of M. tuberculosis alongside expansion of selected indigenous strains having biological characteristics that carry a fitness benefit in the urbanized settings of post-colonial Africa.
Background Direct whole-genome sequencing of Mycobacterium tuberculosis from clinical specimens will be a major breakthrough in tuberculosis diagnosis and control. To date, direct whole-genome sequencing has never been used in genomic epidemiology, and its accuracy in transmission inference remains unknown. We investigated the technical challenges imposed by direct whole-genome sequencing, and used it to infer transmission clusters and predict drug resistance.Methods Using an optimised workflow, we did direct whole-genome sequencing for 37 clinical specimens from 23 tuberculosis patients. Nine sputum samples from nine patients who were infected with different non-tuberculous mycobacteria and culture-negative for tuberculosis were used as controls in the qPCR assays and pre-sequencing runs. Additionally, 780 clinical isolates in the region of Comunidad Valenciana (Spain) were whole-genome sequenced between Jan 1, 2014, and Dec 31, 2016. We analysed the genomic variants to build a tuberculosis transmission network for the region, including the clinical specimens, and to predict drug susceptibility profiles.Findings After sequencing 37 clinical specimens, 28 specimens (22 [85%] of 26 smear-positive and six [55%] of 11 smear-negative) met the quality criteria for downstream analysis. All 28 clinical specimens clustered with their matching culture isolates, with a median distance of 0 single nucleotide polymorphisms. Of the 28 clinical specimens, 16 (57%) were accurately assigned to ten transmission clusters in the region, and 12 (43%) were unique cases. Transmission inferences and drug-susceptibility predictions from direct whole-genome sequencing data were concordant with sequences from corresponding cultures and phenotypic drug-susceptibility testing. Complete genomic analysis, within a week of specimen receipt, cost €217 per sample (excluding personnel costs).Interpretation Direct whole-genome sequencing could be used to accurately delineate transmission clusters of tuberculosis and conduct culture-independent surveillance. Compared with conventional approaches, direct wholegenome sequencing allows researchers to do real-time genomic epidemiology and drug resistance surveillance in settings where culture and drug susceptibility testing are not available.
Background: Contaminant DNA is a well-known confounding factor in molecular biology and in genomic repositories. Strikingly, analysis workflows for whole-genome sequencing (WGS) data commonly do not account for errors potentially introduced by contamination, which could lead to the wrong assessment of allele frequency both in basic and clinical research. Results: We used a taxonomic filter to remove contaminant reads from more than 4000 bacterial samples from 20 different studies and performed a comprehensive evaluation of the extent and impact of contaminant DNA in WGS. We found that contamination is pervasive and can introduce large biases in variant analysis. We showed that these biases can result in hundreds of false positive and negative SNPs, even for samples with slight contamination. Studies investigating complex biological traits from sequencing data can be completely biased if contamination is neglected during the bioinformatic analysis, and we demonstrate that removing contaminant reads with a taxonomic classifier permits more accurate variant calling. We used both real and simulated data to evaluate and implement reliable, contamination-aware analysis pipelines.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.