Physiological studies in vocal animals such as songbirds indicate that vocalizations drive auditory neurons particularly well. But the neural mechanisms whereby vocalizations are encoded differently from other sounds in the auditory system are unknown. We used spectrotemporal receptive fields (STRFs) to study the neural encoding of song versus the encoding of a generic sound, modulationlimited noise, by single neurons and the neuronal population in the zebra finch auditory midbrain. The noise was designed to match song in frequency, spectrotemporal modulation boundaries, and power. STRF calculations were balanced between the two stimulus types by forcing a common stimulus subspace. We found that 91% of midbrain neurons showed significant differences in spectral and temporal tuning properties when birds heard song and when birds heard modulation-limited noise. During the processing of noise, spectrotemporal tuning was highly variable across cells. During song processing, the tuning of individual cells became more similar; frequency tuning bandwidth increased, best temporal modulation frequency increased, and spike timing became more precise. The outcome was a population response to song that encoded rapidly changing sounds with power and precision, resulting in a faithful neural representation of the temporal pattern of a song. Modeling responses to song using the tuning to modulation-limited noise showed that the population response would not encode song as precisely or robustly. We conclude that stimulus-dependent changes in auditory tuning during song processing facilitate the high-fidelity encoding of the temporal pattern of a song.