The research focuses on developing a novel method for the automatic recognition of human psychoemotional states (PES) using deep learning technology. This method is centered on analyzing speech signals to classify distinct emotional states. The primary challenge addressed by this research is to accurately perform multiclass classification of seven human psychoemotional states, namely joy, fear, anger, sadness, disgust, surprise, and a neutral state. Traditional methods have struggled to accurately distinguish these complex emotional nuances in speech. The study successfully developed a model capable of extracting informative features from audio recordings, specifically mel spectrograms and mel-frequency cepstral coefficients. These features were then used to train two deep convolutional neural networks, resulting in a classifier model. The uniqueness of this research lies in its use of a dual-feature approach and the employment of deep convolutional neural networks for classification. This approach has demonstrated high accuracy in emotion recognition, with an accuracy rate of 0.93 in the validation subset. The high accuracy and effectiveness of the model can be attributed to the comprehensive and synergistic use of mel spectrograms and mel-frequency cepstral coefficients, which provide a more nuanced analysis of emotional expressions in speech. The method presented in this research has broad applicability in various domains, including enhancing human-machine interface interactions, implementation in the aviation industry, healthcare, marketing, and other fields where understanding human emotions through speech is crucial