The purpose is to optimize emotion recognition capabilities in the tourism industry during emergency events and enhance management efficiency. Hence, based on the principles of emergency events and emotion recognition, this work has outlined the fundamental emotion recognition process. Additionally, an emotion recognition model is constructed based on a Convolutional Neural Network (CNN), and the process of extracting emotions using a three-dimensional (3D) CNN is proposed. Finally, an attention mechanism is employed to optimize the 3D CNN model, and a comparative analysis of accuracy and precision is conducted using the Bimodal Face and Body Gesture Database (FABO). The research findings reveal that the optimized 3D CNN model exhibits lower error rates in recognizing anxiety emotions, with only 3 samples going unrecognized. Its overall recognition performance is superior to other models. There are variations in the recognition accuracy among different models, but in general, the optimized 3D CNN model performs relatively well across various datasets, achieving recognition accuracies of 83.92%, 82.33%, and 88.81%. Compared to other models, the optimized 3D CNN model demonstrates higher precision in recognizing different emotions, particularly excelling in identifying anger, disgust, and happiness, with precision rates of 97%, 91%, and 94%, respectively. This work has improved the accuracy and efficiency of emotion recognition, providing more intelligent and effective support for emergency event management in the tourism industry.