2023
DOI: 10.1093/ve/vead069
|View full text |Cite
|
Sign up to set email alerts
|

Optimizing ancestral trait reconstruction of large HIV Subtype C datasets through multiple-trait subsampling

Xingguang Li,
Nídia S Trovão,
Joel O Wertheim
et al.

Abstract: Large datasets along with sampling bias represent a challenge for phylodynamic reconstructions, particularly when the study data are obtained from various heterogeneous sources and/or through convenience sampling. In this study, we evaluate the presence of unbalanced sampled distribution by collection date, location, and risk group of human immunodeficiency virus Type 1 Subtype C using a comprehensive subsampling strategy and assess their impact on the reconstruction of the viral spatial and risk group dynamic… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

1
2
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1
1
1

Relationship

1
2

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 57 publications
1
2
0
Order By: Relevance
“…We were also able to capture the highest number of transmission events when analysing locrisk626, indicating that subsampling by date, country and risk group might produce the best dataset for reconstructing the transmission dynamics of HIV-1 subtype C while curbing potential sampling biases, which is in line with previous findings [25, 35, 36].…”
Section: Discussionsupporting
confidence: 85%
See 1 more Smart Citation
“…We were also able to capture the highest number of transmission events when analysing locrisk626, indicating that subsampling by date, country and risk group might produce the best dataset for reconstructing the transmission dynamics of HIV-1 subtype C while curbing potential sampling biases, which is in line with previous findings [25, 35, 36].…”
Section: Discussionsupporting
confidence: 85%
“…Further subsampling was performed with Sequence sampling tool for phylogenetics (SAMPI) [24] to obtain a homogenous collection of samples per country, risk group and date (date, country and risk group; date and country; and date and risk group), as described elsewhere [25]. This resulted in full genome datasets of 626 sequences subsampled based on sampling location, risk group and collection date (locrisk626); 562 sequences subsampled based on sampling location and collection date (loc562); and 393 sequences subsampled based on risk group and collection date (risk393).…”
Section: Methodsmentioning
confidence: 99%
“…In addition to fitness, pathogen phylogenies can be strongly shaped by sampling biases [64][65][66]. The major risk here is that oversampling of certain genotypes or lineages could produce a higher branching rate that would be difficult to distinguish from higher pathogen fitness [67][68][69].…”
Section: Decomposing the Components Of Fitness Variationmentioning
confidence: 99%