27Nucleocytoplasmic large DNA viruses have the largest genomes among all viruses and 28 infect diverse eukaryotes across various ecosystems, but their expression regulation and 29 infection strategies are not well understood. We profiled single-cell transcriptomes of the 30 worldwide-distributed microalga Emiliania huxleyi and its specific coccolithovirus 31 responsible for massive bloom demise. Heterogeneity in viral transcript levels detected among 32 single cells was used to reconstruct the viral transcriptional trajectory and to map cells along a 33 continuum of infection states. This enabled identification of novel viral genetic programs, 34 which are composed of five kinetic classes with distinct promoter elements. The infection 35 substantially changed the host transcriptome, causing rapid shutdown of protein-encoding 36 nuclear transcripts at the onset of infection, while the plastid and mitochondrial 37 transcriptomes persisted to mid-and late stages, respectively. Single-cell transcriptomics 38 thereby opens the way for tracking host-pathogen infection dynamics at high resolution within 39 microbial communities in the marine environment. 40 41 42
Main text 43Nucleocytoplasmic large DNA viruses (NCLDVs) are the largest viruses known today 44 in both genome and virion size. They have been found in most major lineages of eukaryotes 45 across diverse habitats (1-4), especially in the marine environment (5, 6). Among the 46 NCLDVs of special ecological importance are members of the family Phycodnaviridae that 47 infect a wide range of key algal hosts (1). These include the cosmopolitan calcifying 48 eukaryotic alga Emiliania huxleyi (Haptophyta), which forms massive annual blooms in the 49 oceans that have a profound impact on the carbon and sulfur biogeochemical cycles (7). E. 50 huxleyi blooms are frequently terminated by a large dsDNA virus -EhV (8), which 51 enhances nutrient cycling and carbon export to the deep ocean (9, 10). This host-virus model 52 provides a trackable system for understanding viral life cycle strategies and host responses. 53High-throughput bulk RNA sequencing (RNA-seq) has been used for whole-genome 54 expression profiling of NCLDVs during infection, shedding light on gene prediction, 55 transcript structure, and changes in metabolic pathways (11)(12)(13). However, bulk RNA-seq 56 profiles average gene expression levels across many cells, whereas infection states can be 57 variable among single cells. To overcome this limitation, single-cell RNA-seq (scRNA-seq) 58 approaches have been developed to probe the transcriptomes of individual cells in a highly 59 parallel manner. These methods have revolutionized our understanding of various 60 developmental and immunological processes (14, 15), including host-virus interactions in 61 mammalian systems (16, 17). 62Here we employed scRNA-seq to study EhV infection of E. huxleyi at the single-cell 63 level, in order to characterize the temporal dynamics and regulation of viral and host 64 transcriptomes. E. huxleyi CCMP2090 cultures were in...