Water bloom development due to eutrophication constitutes a case of niche specialization among planktonic cyanobacteria, but the genomic repertoire allowing bloom formation in only some species has not been fully characterized. We posited that the habitat relevance of a trait begets its underlying genomic complexity, so that traits within the repertoire would be differentially more complex in species successfully thriving in that habitat than in close species that cannot. To test this for the case of bloom-forming cyanobacteria, we curated 17 potentially relevant query metabolic pathways and five core pathways selected according to existing ecophysiological literature. The available 113 genomes were split into those of blooming (45) or nonblooming (68) strains, and an index of genomic complexity for each strain’s version of each pathway was derived. We show that strain versions of all query pathways were significantly more complex in bloomers, with complexity in fact correlating positively with strain blooming incidence in 14 of those pathways. Five core pathways, relevant everywhere, showed no differential complexity or correlations. Gas vesicle, toxin and fatty acid synthesis, amino acid uptake, and C, N, and S acquisition systems were most strikingly relevant in the blooming repertoire. Further, we validated our findings using metagenomic gene expression analyses of blooming and nonblooming cyanobacteria in natural settings, where pathways in the repertoire were differentially overexpressed according to their relative complexity in bloomers, but not in nonbloomers. We expect that this approach may find applications to other habitats and organismal groups.
IMPORTANCE We pragmatically delineate the trait repertoire that enables organismal niche specialization. We based our approach on the tenet, derived from evolutionary and complex-system considerations, that genomic units that can significantly contribute to fitness in a certain habitat will be comparatively more complex in organisms specialized to that habitat than their genomic homologs found in organisms from other habitats. We tested this in cyanobacteria forming harmful water blooms, for which decades-long efforts in ecological physiology and genomics exist. Our results essentially confirm that genomics and ecology can be linked through comparative complexity analyses, providing a tool that should be of general applicability for any group of organisms and any habitat, and enabling the posing of grounded hypotheses regarding the ecogenomic basis for diversification.