This work addresses the scarcity of Cantonese speech emotion datasets by introducing a dedicated dataset and employing innovative methodologies. A tailored feature set, specifically designed for Cantonese, captures intricate emotional expressions. Enhanced efficiency in Cantonese speech emotion recognition is showcased through the utilization of a self-normalization network-based model. With an impressive accuracy of 92.3% on the Cantonese dataset, the model demonstrates robust generalization capabilities across diverse Chinese and English datasets. The obtained results underscore the potential applications of this research in various domains, including Cantonese language education, psychological counseling, and voice assistants. Understanding of Cantonese emotional expressions is advanced, contributing to the preservation of linguistic and cultural heritage.Despite the notable achievements, limitations in dataset coverage and emotion variety are acknowledged. Future endeavors will prioritize expanding the dataset's breadth and incorporating a wider range of emotional expressions. Additionally, the exploration of more comprehensive Cantonese emotion recognition will involve the investigation of multimodal approaches, where audio, visual, and textual cues are combined. These efforts are aimed at addressing current limitations and pushing the field toward a more nuanced understanding of Cantonese emotional communication.