Object detection in remote sensing images faces the challenges of a complex background, large object size variations, and high inter-class similarity. To address these problems, we propose an adaptive adjacent layer feature fusion (AALFF) method, which is developed on the basis of RTMDet. Specifically, the AALFF method incorporates an adjacent layer feature fusion enhancement (ALFFE) module, designed to capture high-level semantic information and accurately locate object spatial positions. ALFFE also effectively preserves small objects by fusing adjacent layer features and employs involution to aggregate contextual information in a wide spatial range for object essential features extraction in complex backgrounds. Additionally, the adaptive spatial feature fusion (ASFF) module is introduced to guide the network to select and fuse the crucial features to improve the adaptability to objects with different sizes. The proposed method achieves mean average precision (mAP) values of 77.1%, 88.9%, and 95.7% on the DIOR, HRRSD, and NWPU VHR-10 datasets, respectively. Notably, our approach achieves mAP75 values of 60.8% and 79.0% on the DIOR and HRRSD datasets, respectively, surpassing the state-of-the-art performance on the DIOR dataset.