Numerous species possess cortical regions that are most sensitive to vocalizations produced by their own kind (conspecifics). In humans, the superior temporal sulci (STS) putatively represent homologous voice-sensitive areas of cortex. However, STS regions have recently been reported to represent auditory experience or “expertise” in general rather than showing exclusive sensitivity to human vocalizations per se. Using functional magnetic resonance imaging and a unique non-stereotypical category of complex human non-verbal vocalizations – human-mimicked versions of animal vocalizations – we found a cortical hierarchy in humans optimized for processing meaningful conspecific utterances. This left-lateralized hierarchy originated near primary auditory cortices and progressed into traditional speech-sensitive areas. These results suggest that the cortical regions supporting vocalization perception are initially organized by sensitivity to the human vocal tract in stages prior to the STS. Additionally, these findings have implications for the developmental time course of conspecific vocalization processing in humans as well as its evolutionary origins.