The application of mulching film has significantly contributed to improving agricultural output and benefits, but residual film has caused severe impacts on agricultural production and the environment. In order to realize the accurate recycling of agricultural residual film, the detection of residual film is the first problem to be solved. The difference in color and texture between residual film and bare soil is not obvious, and residual film is of various sizes and morphologies. To solve these problems, the paper proposes a method for detecting residual film in agricultural fields that uses the attention mechanism. First, a two-stage pre-training approach with strengthened memory is proposed to enable the model to better understand the residual film features with limited data. Second, a multi-scale feature fusion module with adaptive weights is proposed to enhance the recognition of small targets of residual film by using attention. Finally, an inter-feature cross-attention mechanism that can realize full interaction between shallow and deep feature information to reduce the useless noise extracted from residual film images is designed. The experimental results on a self-made residual film dataset show that the improved model improves precision, recall, and mAP by 5.39%, 2.02%, and 3.95%, respectively, compared with the original model, and it also outperforms other recent detection models. The method provides strong technical support for accurately identifying farmland residual film and has the potential to be applied to mechanical equipment for the recycling of residual film.