10The genomes of different individuals of the same prokaryote species can vary widely in gene 11 content, displaying different proportions of core genes, which are present in all genomes, and 12 accessory genes, whose presence varies between genomes. Together, these core and 13 accessory genes make up a species' pangenome. The reasons behind this extensive diversity 14 in gene content remain elusive, and there is an ongoing debate about the contribution of 15 accessory genes to fitness, that is, whether their presence is on average advantageous, 16 neutral, or deleterious. In order to explore this issue, we developed a mathematical model to 17 simulate the gene content of prokaryote genomes and pangenomes. Our model focuses on 18 testing how the fitness effects of genes and their rates of gene gain and loss would affect the 19 properties of pangenomes. We first show that pangenomes with large numbers of low-20 frequency genes can arise due to the gain and loss of neutral and nearly neutral genes in a 21 population. However, pangenomes with large numbers of highly beneficial, low-frequency 22 genes can arise as a consequence of genotype-by-environment interactions when multiple 23 niches are available to a species. Finally, pangenomes can arise, irrespective of the fitness 24 effect of the gained and lost genes, as long as gene gain and loss rates are high. We argue 25 that in order to understand the contribution of different mechanisms to pangenome diversity,
26it is crucial to have empirical information on population structure, gene-by-environment 27 interactions, the distributions of fitness effects and rates of gene gain and loss in different 28 prokaryote groups.
30 42 43Pangenomes arise as a consequence of gene acquisition via horizontal gene transfer (HGT), 44 and gene loss 2 . These processes ultimately result in general patterns observed across 45 prokaryotes, which include an increase in the number of known accessory genes as more 46 genomes from the same species are sequenced -along with a slower decrease in the number 47 of core genes-and a U-shaped gene frequency distribution or spectrum 2,10,11 . The U shape 48 2 of this distribution indicates that there is a large proportion of genes present in a single or very 49 few genomes, few genes present at intermediate frequencies, and substantial proportion of 50 core genes. Although across prokaryotic life there is a certain amount of commonality in the 51 shape of the gene content frequency distributions, different prokaryote species manifest 52 significant differences in the proportions of core and accessory genes [12][13][14] .
54Although we know that HGT and gene loss result in pangenomes, what is less clear are the 55 forces that lead to high diversity in gene content and U-shaped gene frequency distributions.
56According to mathematical models, gain and loss of entirely neutral genes along a simulated 57 phylogeny or population can in principle predict the U-shaped gene frequency distributions of 58 pangenomes 11,15 . However, the fit to real data i...