Due to their submerged and cryptic lifestyle, the vast majority of fungal species are difficult to observe and describe morphologically, and many remain known to science only from sequences detected in environmental samples. The lack of practices to delimit and name most fungal species is a staggering limitation to communication and interpretation of ecology and evolution in kingdom Fungi. Here, we use environmental sequence data as taxonomical evidence and combine phylogenetic and ecological data to generate and test species hypotheses in the class Archaeorhizomycetes (Taphrinomycotina, Ascomycota). Based on environmental amplicon sequencing from a well-studied Swedish pine forest podzol soil, we generate 68 distinct species hypotheses of Archaeorhizomycetes, of which two correspond to the only described species in the class. Nine of the species hypotheses represent 78% of the sequenced Archaeorhizomycetes community, and are supported by long read data that form the backbone for delimiting species hypothesis based on phylogenetic branch lengths.
Soil fungal communities are shaped by environmental filtering and competitive exclusion so that closely related species are less likely to co-occur in a niche if adaptive traits are evolutionarily conserved. In soil profiles, distinct vertical horizons represent a testable niche dimension, and we found significantly differential distribution across samples for a well-supported pair of sister species hypotheses. Based on the combination of phylogenetic and ecological evidence, we identify two novel species for which we provide molecular diagnostics and propose names. While environmental sequences cannot be automatically translated to species, they can be used to generate phylogenetically distinct species hypotheses that can be further tested using sequences as ecological evidence. We conclude that in the case of abundantly and frequently observed species, environmental sequences can support species recognition in the absences of physical specimens, while rare taxa remain uncaptured at our sampling and sequencing intensity.