Background: Holobionts are defined as a host and its microbiota, and there is not a consensus about their status as a unit of selection. The “it’s the song, not the singer” theory proposes that functional traits, instead of taxonomical composition, could be preserved across generations if interspecies interaction patterns perpetuate themselves. We used a novel combination of community level analysis on the functional composition of microbiota-communities to test this theory by using empirical and simulated data. We tested the conservation of functional composition across generations using mosquito and plant datasets. Then, we tested if there is a change of functional composition over time within a generation in human datasets. Finally, we simulated microbiota communities with different amounts of pairwise interspecies interactions and initial configurations to investigate if the interactions can lead to multiple stable community compositions. Results: Our results suggest that the vertically transmitted microbiota starts a predictable change of functions performed by the microbiota over time (i.e. an ecological succession) whose robustness depends on the arrival of diverse migrants. This succession culminates in a stable functional composition state. The pairwise interactions between species of the community are not sufficient to explain the stability of the final community and the existence of alternative stable states, which suggests that the host-microbiota interaction and non-pairwise interactions in general have an important contribution to the robustness of the final community.Conclusions: If the proposed mechanism proves to be valid for a diverse array of host species, this would support the concept of holobionts being used as units of selection, suggesting this has a wider applicability, including animal breeding.