Although estimated to have emerged in humans in Central Africa in the early 1900s, HIV-1, the main causative agent of AIDS, was only discovered in 1983. With very little direct biological data of HIV-1 from before the 1980s, far-reaching evolutionary and epidemiological inferences regarding the long prediscovery phase of this pandemic are based on extrapolations by phylodynamic models of HIV-1 genomic sequences gathered mostly over recent decades. Here, using a very sensitive multiplex RT-PCR assay, we screened 1,652 formalin-fixed paraffin-embedded tissue specimens collected for pathology diagnostics in Kinshasa, Democratic Republic of Congo (DRC), between 1959 and 1967. We report the near-complete genome of one positive from 1966 ("DRC66")-a non-recombinant sister lineage to subtype C that constitutes the oldest HIV-1 near-full-length genome recovered to date. Root-to-tip plots showed the DRC66 sequence is not an outlier as would be expected if dating estimates from more recent genomes were systematically biased; and inclusion of DRC66 sequence in tip-dated BEAST analyses did not significantly alter root and internal node age estimates based on post-1978 HIV-1 sequences. There was larger variation in divergence time estimates among datasets that were subsamples of the available HIV-1 genomes from 1978-2015, showing the inherent phylogenetic stochasticity across subsets of the real HIV-1 diversity. In conclusion, this unique archival HIV-1 sequence provides direct genomic insight into HIV-1 in 1960s DRC, and, as an ancient-DNA calibrator, it validates our understanding of HIV-1 evolutionary history.
SignificanceInferring the precise timing of the origin of the HIV/AIDS pandemic is of great importance because it offers insights into which factors did-or did not-facilitate the emergence of the causal virus. Previous estimates 2 have implicated rapid development during the early 20 th century in Central Africa, which wove onceisolated populations into a more continuous fabric. We recovered the first HIV-1 genome from the 1960s, and it provides direct evidence that HIV-1 molecular clock estimates spanning the last half-century are remarkably reliable. And, because this genome itself was sampled only about a half-century after the estimated origin of the pandemic, it empirically anchors this crucial inference with high confidence.