2021
DOI: 10.1101/2021.01.15.426832
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Optimizing viral genome subsampling by genetic diversity and temporal distribution (TARDiS) for Phylogenetics

Abstract: TARDiS for Philogenetics is a novel tool for optimal genetic sub-sampling. It optimizes both genetic diversity and temporal distribution through a genetic algorithm. TARDiS, along with example data sets and a user manual, is available at https://github.com/smarini/tardis-phylogenetics

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
3
2

Relationship

3
2

Authors

Journals

citations
Cited by 5 publications
(4 citation statements)
references
References 24 publications
0
4
0
Order By: Relevance
“…Next, we explored concordance or discordance of viral and proviral populations in both patients with concordant subtypes (PCG3 and SDS4), as well as in the two patients that showed both concordance and discordance in subtypes (VBP2 and JOR10). After subsampling patients VBP2, PCG3, and SDS4, we optimized viral genome genetic diversity and temporal distribution using TARDIS ( 43 ) and controlled for the presence of both phylogenetic signals (see Fig. S5).…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…Next, we explored concordance or discordance of viral and proviral populations in both patients with concordant subtypes (PCG3 and SDS4), as well as in the two patients that showed both concordance and discordance in subtypes (VBP2 and JOR10). After subsampling patients VBP2, PCG3, and SDS4, we optimized viral genome genetic diversity and temporal distribution using TARDIS ( 43 ) and controlled for the presence of both phylogenetic signals (see Fig. S5).…”
Section: Resultsmentioning
confidence: 99%
“…Sequences from the two runs were aligned using MAFFT ( 79 ). Before performing the maximum likelihood (ML) phylogenetic and Bayesian phylodynamic analyses, we randomly subset the VBP2, PCG3, and SDS4 data sets targeting sequences by compartments (plasma and PBMC when needed) and by time point using TARDIS ( 43 ). These data sets contained a large number of sequences, many of which were nearly identical and therefore not contributing to the phylogenetic resolution.…”
Section: Methodsmentioning
confidence: 99%
“…One group of methods identify clusters of related sequences, e.g., TreeCluster (Balaban et al ., 2019) or PhyCLIP (Han et al ., 2019), but these are unable to objectively select strains within the identified clusters. A second group of methods select or remove single strains: such as TARDiS (Marini et al ., 2021) that can perform time-aware sampling of genetic sequences, or Treemmer (Menardo et al ., 2018) that reduces taxa on a phylogeny through pruning redundant branches. These selection approaches, either do not account for evolution, or are not able to objectively select representatives across tens of thousands of taxa in a reasonable time.…”
Section: Discussionmentioning
confidence: 99%
“…Random subsampling may potentially reduce unknown sampling biases. It is worth noting that the sampling biases may persist even when a large number of samples is available, and large datasets need to be downsampled to make computations tractable [17] , [18] , [19] . A theoretical analysis of small-sample support estimation in the presence of sampling biases is available at [20] .…”
Section: Introductionmentioning
confidence: 99%