This chapter provides a perspective from the latest EEG evidence in how brain signals enlighten the neurophysiological and neurocognitive mechanisms underlying the recognition of socioemotional expression conveyed in human speech and voice, drawing upon event-related potentials' studies (ERPs). Human sound can encode emotional meanings by diferent vocal parameters in words, real-vs. pseudo-speeches, and vocalizations. Based on the ERP indings, recent development of the three-stage model in vocal processing has highlighted initial-and late-stage processing of vocal emotional stimuli. These processes, depending on which ERP components they were mapped onto, can be divided into the acoustic analysis, relevance and motivational processing, ine-grained meaning analysis/integration/access, and higher-level social inference, as the unfolding of the time scale. ERP studies on vocal socioemotions, such as happiness, anger, fear, sadness, neutral, sincerity, conidence, and sarcasm in the human voice and speech have employed diferent experimental paradigms such as crosssplicing, crossmodality priming, oddball, stroop, etc. Moreover, task demand and listener characteristics afect the neural responses underlying the decoding processes, revealing the role of atention deployment and interpersonal sensitivity in the neural decoding of vocal emotional stimuli. Cultural orientation afects our ability to decode emotional meaning in the voice. Neurophysiological paterns were compared between normal and abnormal emotional processing in the vocal expressions, especially in schizophrenia and in congenital amusia. Future directions highlight the study on human vocal expression aligning with other nonverbal cues, such as facial and body language, and the need to synchronize listener's brain potentials with other peripheral measures.