Musicians outperform non‐musicians in vocal emotion perception, likely because of increased sensitivity to acoustic cues, such as fundamental frequency (F0) and timbre. Yet, how musicians make use of these acoustic cues to perceive emotions, and how they might differ from non‐musicians, is unclear. To address these points, we created vocal stimuli that conveyed happiness, fear, pleasure or sadness, either in all acoustic cues, or selectively in either F0 or timbre only. We then compared vocal emotion perception performance between professional/semi‐professional musicians (N = 39) and non‐musicians (N = 38), all socialized in Western music culture. Compared to non‐musicians, musicians classified vocal emotions more accurately. This advantage was seen in the full and F0‐modulated conditions, but was absent in the timbre‐modulated condition indicating that musicians excel at perceiving the melody (F0), but not the timbre of vocal emotions. Further, F0 seemed more important than timbre for the recognition of all emotional categories. Additional exploratory analyses revealed a link between time‐varying F0 perception in music and voices that was independent of musical training. Together, these findings suggest that musicians are particularly tuned to the melody of vocal emotions, presumably due to a natural predisposition to exploit melodic patterns.