Background: Emotional prosody is the result of the dynamic variation of acoustical non-verbal aspects of language that allow people to convey and recognize emotions. Understanding how this recognition develops during childhood to adolescence is the goal of the present paper. We also aim to test the maturation of the ability to perceive mixed emotions in voice. Methods: We tested 133 children and adolescents, aged between 6 and 17 years old, exposed to 4 kinds of emotional (anger, fear, happiness, and sadness) and neutral linguistic meaningless stimuli. Participants were asked to judge the type and degree of perceived emotion on continuous scales. Results: By means of a general linear mixed model analysis, as predicted, a significant interaction between age and emotion was found. The ability to recognize emotions significantly increased with age for all emotional and neutral vocalizations. Girls recognized anger better than boys, who instead confused fear with neutral prosody more than girls did. Across all ages, only marginally significant differences were found between anger, happiness, and neutral versus sadness, which was more difficult to recognize. Finally, as age increased, participants were significantly more likely to attribute mixed emotions to emotional prosody, showing the progressive complexification of the emotional content representation that young adults perceived in emotional prosody. Conclusions: The ability to identify basic emotions from linguistically meaningless stimuli develops from childhood to adolescence. Interestingly, this maturation was not only evidenced in the accuracy of emotion detection, but also in a complexification of emotion attribution in prosody.