Whole genome sequencing (WGS) of Mycobacterium tuberculosis has rapidly evolved from a research tool to a clinical application for the diagnosis and management of tuberculosis and in public health surveillance. This evolution has been facilitated by the dramatic drop in costs, advances in technology, and concerted efforts to translate sequencing data into actionable information. There is however a risk that, in the absence of a consensus and international standards, the widespread use of WGS technology may result in data and processes that lack harmonisation, comparability and validation. In this review, we outline the current landscape of WGS pipelines and applications and set out best practices for M. tuberculosis WGS, including standards for bioinformatics pipelines, curated repository of resistance-causing variants, phylogenetic analyses, quality control processes, and standardised reporting. 1. Introduction Mycobacterium tuberculosis complex (Mtbc) pathogens are collectively the top infectious disease killer globally, causing 10 million new tuberculosis (TB) cases annually 1. Increasingly, 95 new TB cases are already resistant to rifampicin and isoniazid (termed multidrug resistance; 96 MDR-TB), the key first line drugs 1. Tackling the spread and drug resistance burden of this pathogen requires concerted global effort in prevention, diagnosis, treatment and surveillance.
Improved understanding of the genomic variants that allow Mycobacterium tuberculosis (Mtb) to acquire drug resistance, or tolerance, and increase its virulence are important factors in controlling the current tuberculosis epidemic. Current approaches to Mtb sequencing, however, cannot reveal Mtb’s full genomic diversity due to the strict requirements of low contamination levels, high Mtb sequence coverage and elimination of complex regions. We have developed the XBS (compleX Bacterial Samples) bioinformatics pipeline, which implements joint calling and machine-learning-based variant filtering tools to specifically improve variant detection in the important Mtb samples that do not meet these criteria, such as those from unbiased sputum samples. Using novel simulated datasets, which permit exact accuracy verification, XBS was compared to the UVP and MTBseq pipelines. Accuracy statistics showed that all three pipelines performed equally well for sequence data that resemble those obtained from culture isolates of high depth of coverage and low-level contamination. In the complex genomic regions, however, XBS accurately identified 9.0 % more SNPs and 8.1 % more single nucleotide insertions and deletions than the WHO-endorsed unified analysis variant pipeline. XBS also had superior accuracy for sequence data that resemble those obtained directly from sputum samples, where depth of coverage is typically very low and contamination levels are high. XBS was the only pipeline not affected by low depth of coverage (5–10×), type of contamination and excessive contamination levels (>50 %). Simulation results were confirmed using whole genome sequencing (WGS) data from clinical samples, confirming the superior performance of XBS with a higher sensitivity (98.8%) when analysing culture isolates and identification of 13.9 % more variable sites in WGS data from sputum samples as compared to MTBseq, without evidence for false positive variants when rRNA regions were excluded. The XBS pipeline facilitates sequencing of less-than-perfect Mtb samples. These advances will benefit future clinical applications of Mtb sequencing, especially WGS directly from clinical specimens, thereby avoiding in vitro biases and making many more samples available for drug resistance and other genomic analyses. The additional genetic resolution and increased sample success rate will improve genome-wide association studies and sequence-based transmission studies.
Following a huge global effort, the first World Health Organization (WHO)-endorsed catalogue of 17,356 variants in the Mycobacterium tuberculosis complex along with their classification as associated with resistance (interim), not associated with resistance (interim) or uncertain significance was made public In June 2021. This marks a critical step towards the application of next generation sequencing (NGS) data for clinical care. Unfortunately, the variant format used makes it difficult to look up variants when NGS data is generated by other bioinformatics pipelines. Furthermore, the large number of variants of uncertain significance in the catalogue hamper its useability in clinical practice. We successfully converted 98.3% of variants from the WHO catalogue format to the standardized HGVS format. We also created TBProfiler version 4.4.0 to automate the calling of all variants located in the tier 1 and 2 candidate resistance genes along with their classification when listed in the WHO catalogue. Using a representative sample of 339 clinical isolates from South Africa containing 691 variants in a tier 1 or 2 gene, TBProfiler classified 105 (15%) variants as conferring resistance, 72 (10%) as not conferring resistance and 514 (74%) as unclassified, with an average of 29 unclassified variants per isolate. Using a second cohort of 56 clinical isolates from a TB outbreak in Spain containing 21 variants in the tier 1 and 2 genes, TBProfiler classified 13 (61.9%) as unclassified, 7 (33.3%) as not conferring resistance, and a single variant (4.8%) classified as conferring resistance. Continued global efforts using standardized methods for genotyping, phenotyping and bioinformatic analyses will be essential to ensure that knowledge on genomic variants translates into improved patient care.
Background Personalized medicine tailors care based on the patient’s or pathogen’s genotypic and phenotypic characteristics. An automated Clinical Decision Support System (CDSS) could help translate the genotypic and phenotypic characteristics into optimal treatment and thus facilitate implementation of individualized treatment by less experienced physicians. Methods We developed a hybrid knowledge- and data-driven treatment recommender CDSS. Stakeholders and experts first define the knowledge base by identifying and quantifying drug and regimen features for the prototype model input. In an iterative manner, feedback from experts is harvested to generate model training datasets, machine learning methods are applied to identify complex relations and patterns in the data, and model performance is assessed by estimating the precision at one, mean reciprocal rank and mean average precision. Once the model performance no longer iteratively increases, a validation dataset is used to assess model overfitting. Results We applied the novel methodology to develop a treatment recommender CDSS for individualized treatment of drug resistant tuberculosis as a proof of concept. Using input from stakeholders and three rounds of expert feedback on a dataset of 355 patients with 129 unique drug resistance profiles, the model had a 95% precision at 1 indicating that the highest ranked treatment regimen was considered appropriate by the experts in 95% of cases. Use of a validation data set however suggested substantial model overfitting, with a reduction in precision at 1 to 78%. Conclusion Our novel and flexible hybrid knowledge- and data-driven treatment recommender CDSS is a first step towards the automation of individualized treatment for personalized medicine. Further research should assess its value in fields other than drug resistant tuberculosis, develop solid statistical approaches to assess model performance, and evaluate their accuracy in real-life clinical settings.
Background Rifampicin-resistant tuberculosis (RR-TB) remains an important global health problem. Ideally, the complete drug-resistance profile guides individualized treatment for all RR-TB patients, but this is only practised in high-income countries. Implementation of whole genome sequencing (WGS) technologies into routine care in low and middle-income countries has not become a reality due to the expected implementation challenges, including translating WGS results into individualized treatment regimen composition. Methods This trial is a pragmatic, single-blinded, randomized controlled medical device trial of a WGS-guided automated treatment recommendation strategy for individualized treatment of RR-TB. Subjects are 18 years or older and diagnosed with pulmonary RR-TB in four of the five health districts of the Free State province in South Africa. Participants are randomized in a 1:1 ratio to either the intervention (a WGS-guided automated treatment recommendation strategy for individualized treatment of RR-TB) or control (RR-TB treatment according to the national South African guidelines). The primary effectiveness outcome is the bacteriological response to treatment measured as the rate of change in time to liquid culture positivity during the first 6 months of treatment. Secondary effectiveness outcomes include cure rate, relapse rate (recurrence of RR-TB disease) and TB free survival rate in the first 12 months following RR-TB treatment completion. Additional secondary outcomes of interest include safety, the feasibility of province-wide implementation of the strategy into routine care, and health economic assessment from a patient and health systems perspective. Discussion This trial will provide important real-life evidence regarding the feasibility, safety, cost, and effectiveness of a WGS-guided automated treatment recommendation strategy for individualized treatment of RR-TB. Given the pragmatic nature, the trial will assist policymakers in the decision-making regarding the integration of next-generation sequencing technologies into routine RR-TB care in high TB burden settings. Trial registration ClinicalTrials.gov NCT05017324. Registered on August 23, 2021.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.