Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has caused millions of deaths and substantial morbidity worldwide. Intense scientific effort to understand the biology of SARS-CoV-2 has resulted in daunting numbers of genomic sequences. We witnessed evolutionary events that could mostly be inferred indirectly before, such as the emergence of variants with distinct phenotypes, for example transmissibility, severity and immune evasion. This Review explores the mechanisms that generate genetic variation in SARS-CoV-2, underlying the within-host and population-level processes that underpin these events. We examine the selective forces that likely drove the evolution of higher transmissibility and, in some cases, higher severity during the first year of the pandemic and the role of antigenic evolution during the second and third years, together with the implications of immune escape and reinfections, and the increasing evidence for and potential relevance of recombination. In order to understand how major lineages, such as variants of concern (VOCs), are generated, we contrast the evidence for the chronic infection model underlying the emergence of VOCs with the possibility of an animal reservoir playing a role in SARS-CoV-2 evolution, and conclude that the Nature Reviews Microbiology
Review articlevirus (HIV; ~10 -4 × 10 -6 mutations per nucleotide per replication cycle), which, unlike coronaviruses, lack a 3′ exonuclease proofreading mechanism in their replication machinery 8,[10][11][12] . Insertions and deletions result from replication errors and can also generate diversity, such as the deletion at position 69-70 of the spike gene responsible for the S-gene dropout that was instrumental in detecting the SARS-CoV-2 Alpha variant, and has been reported to be associated with increased infectivity 13 .In addition to RNA replication errors, host-mediated genome editing by innate cell defence mechanisms may introduce substantial numbers of directed mutations into the SARS-CoV-2 genome, and thus may influence its evolutionary rate. Cellular mutational drivers include members of the apolipoprotein B mRNA editing enzyme, catalytic polypeptide-like (APOBEC) family [14][15][16] , including APOBEC1, APOBEC3A and APOBEC3G that demonstrate editing activity for numerous DNA and RNA virus and retroviral genomes 17,18 , including SARS- . APOBEC activity has been inferred bioinformatically through observations of a substantial excess of C → U transitions over all other mutations 18,20,21 . SARS-CoV-2 genomes may also be edited by different cellular antiviral proteins (adenosine deaminases that act on RNA 1 (ADAR1)), leading to A → G mutations (and U → C mutations in opposite genomic strands) 21,22 .The potential editing-associated C → U mutations in the SARS-CoV-2 genome sequences introduce complexities to SARS-CoV-2 evolutionary genomic analysis. C → U mutations account, in part, for the strikingly high ratio of non-synonymous changes in SARS-CoV-2 genomes compared with those at synonymous sites; the mean dN/dS ratio ...