Understanding of pandemics depends on characterization of pathogen collections from well-defined and demographically diverse cohorts. Since its emergence in Congo almost a century ago, HIV-1 has geographically spread and genetically diversified into distinct viral subtypes. Phylogenetic analysis can be used to reconstruct the ancestry of the virus to inform on the origin and distribution of subtypes.
We sequenced two 3.6 kb amplicons of HIV-1 genomes from 3,197 participants in a clinical trial with consistent and uniform sampling at sites across 35 countries and analyzed our data with another 2,632 genomes that comprehensively reflects the HIV-1 genetic diversity. We used maximum likelihood phylogenetic analysis coupled with geographical information to infer the state of ancestors.
The majority of our sequenced genomes (n=2,501) were either pure subtypes (A-D, F, G) or CRF01_AE. The diversity and distribution of subtypes across geographical regions differed; United States showed the most homogenous subtype population, whereas African samples were most diverse. We delineated transmission of the four most prevalent subtypes in our dataset (A, B, C, and CRF01_AE), and our results suggest both continuous and frequent transmission of HIV-1 over country borders, as well as single transmission events being the seed of endemic population expansions.
Overall, we show that coupling of genetic and geographical information of HIV-1 can be used to understand origin and spread of pandemic pathogens.