With the swift development of deep learning technologies, speech recognition has emerged as an essential tool in the domain of emotion analysis. These technologies are capable of analysing and recognizing the subtle variations in human emotions, thus enriching the emotional dimension of human-computer interaction. However, existing emotion speech recognition models often exhibit vulnerabilities when faced with meticulously crafted adversarial attacks. To address the challenge, a strategy of adversarial training using the Fast Gradient Sign Method (FGSM) aimed at enhancing the robustness of emotion speech recognition systems is proposed. Through a series of experiments, adversarial training with Long Short-Term Memory (LSTM) and Convolutional Neural Network (CNN) models has notably enhanced the models' resilience to adversarial intrusions, while maintaining a high recognition accuracy. Specifically, the method led to an approximate 7% increase in overall LSTM model robustness and a 3.5% increase for the CNN model against such attacks, with a concomitant reduction in the rate of misrecognition, thereby affirming the efficacy of adversarial training in strengthening model security. This study not only showcases the potential of adversarial training in enhancing the security features of LSTM and CNN models but also opens new avenues for the design and refinement of future emotion speech recognition systems.