Previous studies on noise-vocoded speech showed that the temporal modulation cues provided by the temporal envelope play an important role in the perception of vocal emotion. However, the exact role that the temporal envelope and its modulation components play in the perceptual processing of vocal emotion is still unknown. To clarify the exact features that the temporal envelope contributes to the perception of vocal emotion, a method based on the mechanism of modulation frequency analysis in the auditory system is necessary. In this study, auditory-based modulation spectral features were used to account for the perceptual data collected from vocalemotion recognition experiments using noise-vocoded speech. An auditory-based modulation filterbank was used to calculate the modulation spectrogram of noise-vocoded speech stimuli, and ten types of modulation spectral features were then extracted from the modulation spectrograms. The results showed that there were high similarities between modulation spectral features and the perceptual data of vocal-emotion recognition experiments. It was shown that the modulation spectral features are useful for accounting for the perceptual processing of vocal emotion with noise-vocoded speech.