Human cytomegalovirus is a widespread pathogen of major medical importance. It causes significant morbidity and mortality in immunocompromised individuals, and congenital infections can result in severe disabilities or stillbirth. Development of a vaccine is prioritized, but no candidate is close to release. Although correlations of viral genetic variability with pathogenicity are suspected, knowledge about the strain diversity of the 235-kb genome is still limited. In this study, 96 full-length human cytomegalovirus genomes from clinical isolates were characterized, quadrupling the amount of information available for full-genome analysis. These data provide the first high-resolution map of human cytomegalovirus interhost diversity and evolution. We show that cytomegalovirus is significantly more divergent than all other human herpesviruses and highlight hot spots of diversity in the genome. Importantly, 75% of strains are not genetically intact but contain disruptive mutations in a diverse set of 26 genes, including the immunomodulatory genes UL40 and UL111A. These mutants are independent of culture passage artifacts and circulate in natural populations. Pervasive recombination, which is linked to the widespread occurrence of multiple infections, was found throughout the genome. The recombination density was significantly higher than those of other human herpesviruses and correlated with strain diversity. While the overall effects of strong purifying selection on virus evolution are apparent, evidence of diversifying selection was found in several genes encoding proteins that interact with the host immune system, including UL18, UL40, UL142, and UL147. These residues may present phylogenetic signatures of past and ongoing virus-host interactions.
IMPORTANCEHuman cytomegalovirus has the largest genome of all viruses that infect humans. Currently, there is a great interest in establishing associations between genetic variants and strain pathogenicity of this herpesvirus. Since the number of publicly available full-genome sequences is limited, knowledge about strain diversity is highly fragmented and biased toward a small set of loci. Combined with our previous work, we have now contributed 101 complete genome sequences. We have used these data to conduct the first high-resolution analysis of interhost genome diversity, providing an unbiased and comprehensive overview of cytomegalovirus variability. These data are of major value to the development of novel antivirals and a vaccine and to identify potential targets for genotype-phenotype experiments. Furthermore, these data have enabled a thorough study of the evolutionary processes that have shaped cytomegalovirus diversity.
Human cytomegalovirus (HCMV), the prototype member of the herpesvirus subfamily Betaherpesvirinae, is a widespread and important pathogen. Seroprevalence in the adult population ranges from 45% to 100% (1). After primary infection, HCMV establishes a lifelong, latent infection in myeloid progenitor cells (2). This virus causes mild to ...