Benchmarking the empirical accuracy of short-read sequencing across the<i>M. tuberculosis</i>genome

Mg, Marin; Vargas, Roger; Harris, Michael A.; Jeffrey, Brendan M.; Epperson, L. Elaine; Durbin, David; Strong, Michael; Iqbal, Zamin; Akhundova, Irada; Vashakidze, Sergo; Crudu, Valeriu; Rosenthal, Alex; Farhat, Maha

doi:10.1093/bioinformatics/btac023

Cited by 20 publications

(17 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…On this basis, 150 out of 169 pe / ppe genes with good coverage (>0.7 normalized mean coverage) were included to complement the genomic regions analysed and therefore potentially achieve a deeper separation of the transmission clusters. These regions overlapped with previous studies [ 16 , 22 ]. An extra 568 high-quality SNPs were added, resulting in one additional SNP within the transmission cluster from L2 (S8, S9) and four extra SNPs for L3 (S2, S3, S4), thereby slightly increasing the differences obtained within highly similar samples ( Figure 2C ).…”

Section: Resultssupporting

confidence: 86%

“…Blind spots for Illumina sequencing technologies have been previously reported [ 18 ], for which long-read sequencing technologies can assist [ 20 , 21 ]. In accordance with previous work [ 21 ], our study demonstrates that long-read data has the potential to elucidate complex regions, such as pe / ppe genes, which due to their GC-rich and repetitive nature have been systematically excluded from WGS analysis, losing potential phylogenetic information [ 16 , 22 ]. Coverage of the Illumina replicates on these regions, and more specifically in the most diverse genes of these two families, was shown to be significantly lower than their ONT counterparts, supporting the potential inclusion of these genes for the downstream analysis in WGS from ONT.…”

Section: Discussionsupporting

confidence: 89%

See 1 more Smart Citation

Portable sequencing of Mycobacterium tuberculosis for clinical and epidemiological applications

Gomez-Gonzalez

Campino

Phelan

et al. 2022

Briefings in Bioinformatics

View full text Add to dashboard Cite

With >1 million associated deaths in 2020, human tuberculosis (TB) caused by the bacteria Mycobacterium tuberculosis remains one of the deadliest infectious diseases. A plethora of genomic tools and bioinformatics pipelines have become available in recent years to assist the whole genome sequencing of M. tuberculosis. The Oxford Nanopore Technologies (ONT) portable sequencer is a promising platform for cost-effective application in clinics, including personalizing treatment through detection of drug resistance-associated mutations, or in the field, to assist epidemiological and transmission investigations. In this study, we performed a comparison of 10 clinical isolates with DNA sequenced on both long-read ONT and (gold standard) short-read Illumina HiSeq platforms. Our analysis demonstrates the robustness of the ONT variant calling for single nucleotide polymorphisms, despite the high error rate. Moreover, because of improved coverage in repetitive regions where short sequencing reads fail to align accurately, ONT data analysis can incorporate additional regions of the genome usually excluded (e.g. pe/ppe genes). The resulting extra resolution can improve the characterization of transmission clusters and dynamics based on inferring closely related isolates. High concordance in variants in loci associated with drug resistance supports its use for the rapid detection of resistant mutations. Overall, ONT sequencing is a promising tool for TB genomic investigations, particularly to inform clinical and surveillance decision-making to reduce the disease burden.

show abstract

Section: Resultssupporting

confidence: 86%

Section: Discussionsupporting

confidence: 89%

Portable sequencing of Mycobacterium tuberculosis for clinical and epidemiological applications

Gomez-Gonzalez

Campino

Phelan

et al. 2022

Briefings in Bioinformatics

View full text Add to dashboard Cite

show abstract

“…Recently, few studies characterized those specific regions in the genome, showing that they are close to each other and present a homologous sequence (percent identity of 81%) due to gene duplication, indicating that they could potentially present critical issues with every technology (Karboul et al, 2008;Phelan et al, 2016;de Maio et al, 2020). Interestingly, the remaining PE and PPE regions showed an overall acceptable coverage for SRS and as already described in other studies, the common practice of excluding those genes from the analysis, due to the high GC-content and the repetitive sequences, could be overcome by removing only the PE_PGRS genes (Modlin et al, 2021;Marin et al, 2022).…”

Section: Discussionmentioning

confidence: 90%

“…The same technology has been used to investigate tuberculosis outbreaks and transmission dynamics by adopting whole-genome SNP (wgSNP) or core genome Multi-Locus Sequence Typing (cgMLST) schemes assessing genetic relatedness of MTB genomes ( Kohl et al, 2014 , 2018 ). However, short-reads technologies are not able to fully resolve hard-to-sequence regions, because has suboptimal capacity to resolve reliably large structural variations, gene duplications, or variations in repetitive regions ( Modlin et al, 2021 ), thereby reducing coverage depth involving a lack of characterization in terms of drug resistance, virulence, and transmission analysis ( Medha et al, 2021 ; Marin et al, 2022 ). Accurately resolving such regions becomes critical to close bacterial genomes, obtaining more information about virulence, evolutionary mechanisms of drug resistance, and on strain relatedness.…”

Section: Introductionmentioning

confidence: 99%

Advantages of long- and short-reads sequencing for the hybrid investigation of the Mycobacterium tuberculosis genome

Marco

Spitaleri²,

Battaglia³

et al. 2023

Front. Microbiol.

View full text Add to dashboard Cite

IntroductionIn the fight to limit the global spread of antibiotic resistance, computational challenges associated with sequencing technology can impact the accuracy of downstream analysis, including drug resistance identification, transmission, and genome resolution. About 10% of Mycobacterium tuberculosis (MTB) genome is constituted by the PE/PPE family, a GC-rich repetitive genome region. Although sequencing using short read technology is widely used, it is well recognized its limit in the PE/PPE regions due to the unambiguously mapping process onto the reference genome. The aim of this study was to compare the performances of short-reads (SRS), long-reads (LRS) and hybrid-reads (HYBR) based analysis over different common investigative tasks: genome coverage estimation, variant calling and cluster analysis, drug resistance detection and de novo assembly.MethodsFor the study 13 model MTB clinical isolates were sequenced with both SRS and LRS. HYBR were produced correcting the long reads with the short reads. The fastq from the three approaches were then processed using a customized version of MTBseq for genome coverage estimation and variant calling and using two different assemblers for de novo assembly evaluation.ResultsEstimation of genome coverage performances showed lower 8X breadth coverage for SRS respect to LRS and HYBR: considering the PE/PPE genes, SRS showed low results for the PE_PGRS family, while obtained acceptable coverage in PE and PPE genes; LRS and HYBR reached optimal coverages in PE/PPE genes. For variant calling HYBR showed the highest resolution, detecting the highest percentage of uniquely identified mutations compared to LRS and SRS. All three approaches agreed on the identification of two major clusters, with HYBR identifying an higher number of SNPs between the two clusters. Comparing the quality of the assemblies, HYBR and LRS obtained better results than SRS.DiscussionIn conclusion, depending on the aim of the investigation, both SRS and LRS present complementary advantages and limitations implying that for a full resolution of MTB genomes, where all the mentioned analyses and both technologies are needed, the use of the HYBR approach represents a valid option and a well-rounded strategy.

show abstract

“…We assessed the congruence in variant calls between short-read Illumina data and long-read PacBio data for a set of isolates that underwent sequencing with both technologies (Marin et al, 2022). Using 31 isolates for which both Illumina and a complete PacBio assembly were available, we evaluated the empirical base-pair recall (EBR) of all base-pair positions of the H37rv reference genome.…”

Section: Methodsmentioning

confidence: 99%

Phase variation as a major mechanism of adaptation inMycobacterium tuberculosiscomplex

Vargas

Luna

Freschi

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Phase variation induced by insertions and deletions (INDELs) in genomic homopolymeric tracts (HT) can silence and regulate genes in pathogenic bacteria but this process is not characterized in MTBC adaptation. We leverage 31,428 diverse clinical isolates to identify genomic regions including phase-variants under positive selection. Of 87,651 INDEL events that emerge repeatedly across the phylogeny, 12.4% are phase-variants within HTs (0.02% of the genome by length). We estimated the in-vitro frameshift rate in a neutral HT at 100x the neutral substitution rate at 1.1 × 10−5 frameshifts/HT/year. Using neutral evolution simulations, we identified 4,098 substitutions and 45 phase-variants to be putatively adaptive to MTBC (P<0.002). We experimentally confirm that a putatively adaptive phase-variant alters the expression of espA, a critical mediator of ESX-1 dependent virulence. Our evidence supports a new hypothesis that phase variation in the ESX-1 system of MTBC can act as a toggle between antigenicity and survival in the host.

show abstract

Benchmarking the empirical accuracy of short-read sequencing across theM. tuberculosisgenome

Cited by 20 publications

References 33 publications

Portable sequencing of Mycobacterium tuberculosis for clinical and epidemiological applications

Portable sequencing of Mycobacterium tuberculosis for clinical and epidemiological applications

Advantages of long- and short-reads sequencing for the hybrid investigation of the Mycobacterium tuberculosis genome

Phase variation as a major mechanism of adaptation inMycobacterium tuberculosiscomplex

Contact Info

Product

Resources

About