2019
DOI: 10.1101/733642
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Genomic variant identification methods alter Mycobacterium tuberculosis transmission inference

Abstract: 20Pathogen genomic data are increasingly used to characterize global and local transmission patterns of important 21 human pathogens and to inform public health interventions. Yet there is no current consensus on how to measure 22 genomic variation. We investigated the effects of variant identification approaches on transmission inferences for 23 M. tuberculosis by comparing variants identified by five different groups in the same sequence data from a clonal 24 outbreak. We then measured the performance of com… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
4
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5

Relationship

0
5

Authors

Journals

citations
Cited by 6 publications
(5 citation statements)
references
References 70 publications
1
4
0
Order By: Relevance
“…1 ). In addition, the inclusion of two synthetic FASTQ files generated from an edited reference sequence ( 22 ) and differing by three single SNPs, one double nucleotide change, and one 3-bp insertion (sample 12 and 13 in Table 3 ) were reported by participating laboratories as differing by a number of SNPs ranging from one to seven, which is in line with previous findings ( 21 ). These minor differences between pipelines, although having only minor effects when detecting potentially epidemiologically linked isolates, may have implications when inferring transmission chains as the natural accumulation of mutations in M. tuberculosis is extremely slow ( 9 ).…”
Section: Discussionsupporting
confidence: 87%
See 1 more Smart Citation
“…1 ). In addition, the inclusion of two synthetic FASTQ files generated from an edited reference sequence ( 22 ) and differing by three single SNPs, one double nucleotide change, and one 3-bp insertion (sample 12 and 13 in Table 3 ) were reported by participating laboratories as differing by a number of SNPs ranging from one to seven, which is in line with previous findings ( 21 ). These minor differences between pipelines, although having only minor effects when detecting potentially epidemiologically linked isolates, may have implications when inferring transmission chains as the natural accumulation of mutations in M. tuberculosis is extremely slow ( 9 ).…”
Section: Discussionsupporting
confidence: 87%
“…Nonetheless, some laboratories did report structurally high SNP distances for closely related isolates, even if the isolates were correctly reported as potentially related. This is likely due to incomplete filtering of noise in the SNP-calling algorithm, often caused by not excluding all poorly mapped reads, used by different laboratories, which makes the comparison of precise SNP distances provided by the EQA participants potentially misleading ( 21 ).…”
Section: Discussionmentioning
confidence: 99%
“…First of all, while generally all pe and ppe genes are excluded from bioinformatic datasets, this approach is overly stringent in most cases. Even short‐read sequencing techniques can reliably map almost all pe and ppe genes thanks to paired‐end technologies and increased read lengths, if only pe_pgrs and ppe‐mptr genes/transcripts are excluded (Miran and Farhat – personal communication, Holt et al , ; Ates et al , ; Walter et al , ). Furthermore, knowing the subgroup of a PE/PPE can be an excellent starting point to hypothesize the most‐likely route of secretion and may even suggest functions, or redundancy.…”
Section: Secretion and Functions Of Specific Pe And Ppe Protein Subgrmentioning
confidence: 99%
“…Exporting a simulated outbreak 6. This simulation protocol of TransPhylo is especially useful to simulate outbreaks and benchmark how accurate methods of inference (either Basic Protocol 3 of TransPhylo or some other method) are likely to be when applied to real datasets (Ness et al, 2019;Stimson et al, 2019;Walter et al, 2019). In this case, it is important to make sure that the parameters used for the simulation are realistic for the pathogen of interest in the real data.…”
Section: Simulation Of An Outbreakmentioning
confidence: 99%