13Ammonia-oxidising archaea of the phylum Thaumarchaeota are keystone species in global 14 nitrogen cycling. However, only three of the six known families of the terrestrially ubiquitous 15 order Nitrososphaerales possess representative genomes. Here we provide genomes for the 16 three remaining families and examine the impact of gene duplication, loss and transfer events 17 across the entire phylum. Much of the genomic divergence in this phylum is driven by gene 18 duplication and loss, but we also detected early lateral gene transfer that introduced 19 considerable proteome novelty. In particular, we identified two large gene transfer events into 20 Nitrososphaerales. The fate of gene families originating on these branches was highly lineage-21 specific, being lost in some descendant lineages, but undergoing extensive duplication in 22 others, suggesting niche-specific roles within soil and sediment environments. Overall, our 23 results suggest that lateral gene transfer followed by gene duplication drives Nitrososphaerales 24 evolution, highlighting a previously under-appreciated mechanism of genome expansion in 25 archaea. 26 27 73 provided a well-supported thaumarchaeotal phylogenomic tree (Figure 1), with most nodes 74 with UF bootstrap values > 95 % and SH-aLRT values > 95 %. This phylogenomic tree was 75 the best supported tree after comparison of several phylogenomic reconstruction 76 5methodologies (Supplementary Information: Extended phylogenomics). The tree is broadly 77 similar to previously published work 11 with some exceptions, mainly relating to poorly-78 supported basal branches in both trees (Supplementary Information: Extended phylogenomics).
79Previous classifications of Thaumarchaeota have been based on phylogenetic analyses of 80 the ammonia monooxygenase (amoA) gene, but this gene is not present in all members of the 81 phylum. We therefore suggest a phylum-wide thaumarchaeotal classification based on our 82 phylogenomic analysis that maintains maximum consistency with previous work while 83 incorporating the early-diverging lineages that lack amoA 2, 17, 18 (Table S2). Our genome dataset 84 of Thaumarchaeota represent a diverse phylum comprising 8 classes, 10 orders, 28 families, 31 85 genus and 103 species. While the classically used thaumarchaeotal nomenclature is congruent 86 with the taxonomic stratification, a few exceptions were observed. For example, the order 87 Nitrosopumilales encompasses Candidatus Nitrosotalea and Cenarchaeum, which were 88 previously suggested to represent orders of their own, and Nitrososphaerales contains a 89 minimum of 8 genera, with Ca. Nitrososphaera gargensis and Nitrososphaera viennensis and 90Ca. Nitrososphaera evergladensis belonging to different genera. 91 We also used the Genome Taxonomy Database Toolkit (GTDB-Tk) to evaluate the 92 genomic diversity of our 12 new MAGs. While one of these genomes (TH1173) is a close 93 relative of a published genome, MY3, the range of relative evolutionary divergence (RED) 19 94 values for the other 11 MA...