Facial expressions play a critical role in human communication by effectively conveying a wide range of emotions. Recent advancements in artificial intelligence and computer vision have made deep neural networks powerful tools for recognizing facial emotions. In a prior study, we introduced EmoNeXt, an innovative deep learning framework for facial expression recognition based on a modified ConvNeXt architecture, incorporating several significant enhancements. Initially, we evaluated its performance on the FER2013 dataset. This paper extends the evaluation of EmoNeXt to additional well-known benchmark datasets, including AffectNet and CK+, to further validate its robustness and generalizability. EmoNeXt integrates enhancements such as Spatial Transformer Networks, which help the model focus on areas of the face rich in expressive features, Squeeze-and-Excitation blocks, which improve the model’s ability to understand dependencies between different channels, and a self-attention regularization term that encourages the model to produce compact and effective feature vectors. Furthermore, this paper presents an extensive ablation study conducted on the three datasets, analyzing and confirming the contribution of each added module of our EmoNeXt model. Finally, we explore EmoNeXt’s application in recognizing emotions in elderly individuals with Alzheimer’s disease, addressing the urgent need for accurate emotion recognition to improve patient care.