Speech is a distinctive feature of our species. It is the default channel for language and constitutes our primary mode of social communication. Determining the evolutionary origins of speech is a challenging prospect, in large part because it appears to be unique in the animal kingdom. However, direct comparisons between speech and other forms of acoustic communication, both in humans (music) and animals (vocalization), suggest that important components of speech are shared across domains and species. In this review, we focus on a single aspect of speech—temporal patterning—examining similarities and differences across speech, music, and animal vocalization. Additional structure is provided by focusing on three specific functions of temporal patterning across domains: (1) emotional expression, (2) social interaction, and (3) unit identification. We hypothesize an evolutionary trajectory wherein the ability to identify units within a continuous stream of vocal sounds derives from social vocal interaction, which, in turn, derives from vocal emotional communication. This hypothesis implies that unit identification has parallels in music and precursors in animal vocal communication. Accordingly, we demonstrate the potential of comparisons between fundamental domains of biological acoustic communication to provide insight into the evolution of language.