SARS-CoV-2, the causative agent of COVID-19, emerged in late 2019 causing a global pandemic, with the United Kingdom (UK) one of the hardest hit countries. Rapid sequencing and publication of consensus genomes have enabled phylogenetic analysis of the virus, demonstrating SARS-CoV-2 evolves relatively slowly 1 , but with multiple sites in the genome that appear inconsistent with the overall consensus phylogeny 2 . To understand these discrepancies, we used veSEQ 3 , a targeted RNA-seq approach, to quantify minor allele frequencies in 413 clinical samples from two UK locations. We show that SARS-CoV-2 infections are characterised by extensive within-host diversity, which is frequently shared among infected individuals with patterns consistent with geographical structure. These results were reproducible in data from other sequencing locations around the UK, where we find evidence of mixed infection by major circulating lineages with patterns that cannot readily be explained by artefacts in the data. We conclude that SARS-CoV-2 diversity is transmissible, and propose that geographic patterns are generated by co-circulation of distinct viral populations. Co-transmission of mixed populations could open opportunities for resolving clusters of transmission and understanding pathogenesis. symptomatic individuals who tested positive for COVID-19 within two geographically-separate hospital trusts (Oxford University Hospitals and Basingstoke and North Hampshire Hospital, located 37 miles (60 km) apart; Supplementary Table 1) . Using veSEQ, a sequencing protocol based on a quantitative targeted enrichment strategy 3 , which we previously validated for other viruses 3 , 11 , 12 , we characterised the full spectrum of within-host diversity in SARS-CoV-2 and contextualised our findings within other high-quality, publicly available deep-sequencing datasets from the UK generated on the high-fidelity Illumina platform 13 , 14 . All genomic data has been made publicly available as part of the COVID-19 Genomics UK (COG-UK) Consortium [cogconsortium.uk] via GISAID 15 and the European Nucleotide Archive (ENA) study PRJEB37886.
Within-host diversity is extensive and shared between individualsTo examine patterns of within-host diversity, we first considered the distribution of minor allele frequencies (MAFs) in the mapped reads at every position along the genome. This analysis was supported by data curation to ensure that only high-confidence variants were examined, which included analysis of in-batch quantification controls as well as a stringent computational clean-up to eliminate any residual cross-mapping 16 , previously validated for targeted metagenomics 11 (see Methods and Supplementary Text for a full description). In combination with unique dual indexing (UDI), these procedures generated highly robust minority variant calls, which were reproducible in independent replicates and distinguishable from methodological noise above a threshold of 2% of reads at a given position (Supplementary Figure 1). The distribution of MAFs was a...