The term ‘autism spectrum disorder’ describes a neurodevelopmental illness typified by verbal and nonverbal interaction impairments, repetitive behaviour patterns and poor social interaction. Understanding mental states from FEs is crucial for interpersonal interaction and social interaction. But when there are occlusions like glasses, facial hair or self‐occlusion, it becomes harder to identify facial expressions accurately. This research tackles the issue of identifying facial expressions when parts of the face are occluded and suggests an innovative technique to tackle this difficulty. Creating a strong framework for facial expression recognition (FER) that better handles occlusions and increases recognition accuracy is the goal of this research. Therefore, we propose novel Improved DenseNet‐based Residual Cross‐Attention Transformer (IDenseNet‐RCAformer) system to tackle the partial occlusion FER problem in autism patients. The recognition framework's efficacy is assessed using four datasets of facial expressions, and some preprocessing procedures are conducted to increase the expression recognition efficiency. After that, when recognizing expressions, a simple argmax function is applied to get a forecasted landmark position from a heatmap. Then feature extraction phase, local and global representation are captured from preprocessed images by adopting Inception‐ResNet‐V2 approach, Cross‐Attention Transformer, respectively. Moreover, both features are fused by employing the FusionNet method, thereby enhancing system's training speed and precision. After the features are extracted, an improved DenseNet mechanism is applied to efficiently recognize some variety of facial expressions in partially occluded autism patients. A number of performance metrics are determined and analysed to demonstrate the proposed approach's effectiveness, where the IDenseNet‐RCAformer performs best with an accuracy of 98.95%. According to the experimental findings, the proposed framework significantly outperforms the prior recognition frameworks in terms of recognition outcomes.