19The persistent inertia in the ability to culture environmentally abundant microbes from aquatic 20 ecosystems represents an obstacle in disentangling the complex web of ecological 21 interactions spun by a diverse assortment of participants (pro-and eukaryotes and their 22 viruses). In aquatic microbial communities, the numerically most abundant actors, the 23 viruses, remain the most elusive, and especially in freshwaters their identities and ecology 24 remain obscure. Here, using ultra-deep metagenomic sequencing from freshwater habitats Supplementary Table S1). While most samples we sequenced were ca. 54 Gb in size 87 (ranging from 190-482 million reads, average 368 million reads), two Římov samples (epi 88 and hypolimnion) we sequenced ca. 380 Gb each (2.5 billion reads each). An overview of the 89 microbial community using 16S rRNA abundances for both sites is shown in Supplementary 90 Figure S2.
91We also collected an additional 149 publicly available freshwater metagenomes 92 ( Supplementary Table S1, total of 4.04 billion reads, 1.09 Tb data) to search for complete 93 phage genomes. All datasets were assembled independently (no co-assembly). In total, we 94 analyzed ca. 3 Tb of metagenomic sequences from freshwater (ca. 17 billion reads).
95The number of complete phage genomes recovered from any sample increased with 96 sequencing depth, but with diminishing returns (Figure 1a), with genome recovery 97 maintaining linearity up to 100 Gb (ca. 1 phage genome for every additional Gb) before 98 tapering off at a maximum of 160 genomes from 400 Gb sequence data. While a total of 99 1677 genomes were assembled from the sequence data generated from the two study sites,
100(Římov and Jiřická), 357 genomes were recovered from all other available freshwater 101 metagenomes. This suggests that the potential of ultra-deep sequencing to recover far more 102 phage genomes has not yet been fully realized. We also recovered a number of 103 metagenome-assembled genomes (MAGs) from the Římov metagenome time-series dataset 104 (see below). We denominate this entire collection of genomes as the Uncultured Freshwater 105 Organisms (UFO) dataset, where the UFOv subset refers to viruses and the UFOp subset to 106 prokaryotic genomes.
107Phage genome analyses 108 A total of 598 complete phage genomes were recovered from the Římov epilimnion (10 109 samples), 800 from the hypolimnion (8 samples) and 279 from Jiřická (5 samples). Upon 110 dereplication (genomes with >95% identity and >95% coverage treated as one, see 111 methods), these numbers reduced by nearly three-fold for the hypolimnion suggesting 112 repeated capture of nearly identical genomes from multiple samplings. We found only a 113 single instance of a phage that was nearly identical in two habitats (Římov and Jiřická).
114The comparison of recovered freshwater phage genomes to representative sets of phages 115 from Viral RefSeq (1996 genomes) and the marine habitat (1335 genomes) [23-25] is shown 116 in Fig. 1c. Intriguingly, the genome size distributions of marin...