We investigated the cortical representation of emotional prosody in normal‐hearing listeners using functional near‐infrared spectroscopy (fNIRS) and behavioural assessments. Consistent with previous reports, listeners relied most heavily on F0 cues when recognizing emotion cues; performance was relatively poor—and highly variable between listeners—when only intensity and speech‐rate cues were available. Using fNIRS to image cortical activity to speech utterances containing natural and reduced prosodic cues, we found right superior temporal gyrus (STG) to be most sensitive to emotional prosody, but no emotion‐specific cortical activations, suggesting that while fNIRS might be suited to investigating cortical mechanisms supporting speech processing it is less suited to investigating cortical haemodynamic responses to individual vocal emotions. Manipulating emotional speech to render F0 cues less informative, we found the amplitude of the haemodynamic response in right STG to be significantly correlated with listeners' abilities to recognise vocal emotions with uninformative F0 cues. Specifically, listeners more able to assign emotions to speech with degraded F0 cues showed lower haemodynamic responses to these degraded signals. This suggests a potential objective measure of behavioural sensitivity to vocal emotions that might benefit neurodiverse populations less sensitive to emotional prosody or hearing‐impaired listeners, many of whom rely on listening technologies such as hearing aids and cochlear implants—neither of which restore, and often further degrade, the F0 cues essential to parsing emotional prosody conveyed in speech.