The transmission fitness and pathogenesis of HIV-1 is disproportionately influenced by evolution in the five variable regions (V1–V5) of the surface envelope glycoprotein (gp120). Insertions and deletions (indels) are a significant source of evolutionary change in these regions. However, the rate and composition of indels has not yet been quantified through a large-scale comparative analysis of HIV-1 sequences. Here, we develop and report results from a phylogenetic method to estimate indel rates for the gp120 variable regions across five major subtypes and two circulating recombinant forms (CRFs) of HIV-1 group M. We processed over 26,000 published HIV-1 gp120 sequences, from which we extracted 6,605 sequences for phylogenetic analysis. We reconstructed time-scaled phylogenies by maximum likelihood and fit a binomial-Poisson model to the observed distribution of indels between closely related pairs of sequences in each tree (cherries). By focusing on cherries in each tree, we obtained phylogenetically independent indel reconstructions, and the shorter time scales in cherries reduced the bias due to purifying selection. Rate estimates ranged from 3.0×10−5 to 1.5×10−3 indels/nt/year and varied significantly among variable regions and subtypes. Indel rates were significantly lower in V3 relative to V1, and were also lower in HIV-1 subtype B relative to the 01_AE reference. We also found that V1, V2, and V4 tended to accumulate significantly longer indels. Furthermore, we observed that the nucleotide composition of indels was distinct from the flanking sequence, with higher frequencies of G and lower frequencies of T. Indels affected N-linked glycosylation sites more often in V1 and V2 than expected by chance, consistent with positive selection on glycosylation patterns within these regions. These results represent the first comprehensive measures of indel rates in HIV-1 gp120 across multiple subtypes and CRFs, and identifies novel and unexpected patterns for further research in the molecular evolution of HIV-1.
1The transmission and pathogenesis of human immunodeficiency virus type 1 (HIV-1) is dispropor-2 tionately influenced by evolution in the five variable regions of the virus surface envelope glyco-3 protein (gp120). Insertions and deletions (indels) are a significant source of evolutionary change 4 in these regions. However, the influx of indels relative to nucleotide substitutions has not yet been 5 quantified through a comparative analysis of HIV-1 sequence data. Here we develop and report 6 results from a phylogenetic method to estimate indel rates for the gp120 variable regions across 7 five major subtypes and two circulating recombinant forms (CRFs) of HIV-1 group M. We pro-8 cessed over 26,000 published HIV-1 gp120 sequences, from which we extracted 6,605 sequences 9 for phylogenetic analysis. In brief, our method employs maximum likelihood to reconstruct phy-10 logenies scaled in time and fits a Poisson model to the observed distribution of indels between 11 closely related pairs of sequences in the tree (cherries). The rate estimates ranged from 3.0 × 10 −5 12 to 1.5 × 10 −3 indels/nt/year and varied significantly among variable regions and subtypes. Indel 13 rates were significantly lower in the region encoding variable loop V3, and also lower for HIV-1 14 subtype B relative to other subtypes. We also found that variable loops V1, V2 and V4 tended 15 to accumulate significantly longer indels. Further, we observed that the nucleotide composition 16 of indel sequences was significantly distinct from that of the flanking sequence in HIV-1 gp120. 17Indels affected potential N-linked glycosylation sites substantially more often in V1 and V2 than 18 expected by chance, which is consistent with positive selection on glycosylation patterns within 19 these regions of gp120. These results represent the first comprehensive measures of indel rates in 20 HIV-1 gp120 across multiple subtypes and CRFs, and identifies novel and unexpected patterns for 21 further research in the molecular evolution of HIV-1. 22
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.