There are problems associated with facial expression recognition (FER), such as facial occlusion and head pose variations. These two problems lead to incomplete facial information in images, making feature extraction extremely difficult. Most current methods use prior knowledge or fixed-size patches to perform local cropping, thereby enhancing the ability to acquire fine-grained features. However, the former requires extra data processing work and is prone to errors; the latter destroys the integrity of local features. In this paper, we propose a local Sliding Window Attention Network (SWA-Net) for FER. Specifically, we propose a sliding window strategy for feature-level cropping, which preserves the integrity of local features and does not require complex preprocessing. Moreover, the local feature enhancement module mines fine-grained features with intraclass semantics through a multiscale depth network. The adaptive local feature selection module is introduced to prompt the model to find more essential local features. Extensive experiments demonstrate that our SWA-Net model achieves a comparable performance to that of state-of-the-art methods with scores of 90.03% on RAF-DB, 89.22% on FERPlus, 63.97% on AffectNet.