Birdsong is a longstanding model system for studying evolution, and has recently emerged as a measure of biodiversity loss due to deforestation and climate change. Here, we collected and analyzed high quality song recordings from seven species in the family Estrildidae. We measured the acoustic features of syllables and then used dimensionality reduction and machine learning classifiers to identify features that accurately assigned syllables to species. Species differences were captured by the first 3 principal components, corresponding to basic spectral features, spectral shape, and spectrotemporal features. We then identified the measured features underlying classification accuracy. We found that fundamental frequency, mean frequency, spectral flatness, and syllable duration were the most informative features for species identification. Next, we tested whether specific acoustic features of species’ songs predicted phylogenetic distance. We found significant phylogenetic signal in syllable spectral features, but not in spectral shape or spectrotemporal features. Results indicate that spectral features are more constrained by species’ genetics than are other features, and are the best signal features for identifying species from song recordings. The absence of phylogenetic signal in spectral shape and spectrotemporal features suggests that these song features are labile, reflecting learning-processes and individual recognition.