Remote sensing images are often of low quality due to the limitations of the equipment, resulting in poor image accuracy, and it is extremely difficult to identify the target object when it is blurred or small. The main challenge is that objects in sensing images have very few pixels. Traditional convolutional networks are complicated to extract enough information through local convolution and are easily disturbed by noise points, so they are usually not ideal for classifying and diagnosing small targets. The current solution is to process the feature map information at multiple scales, but this method does not consider the supplementary effect of the context information of the feature map on the semantics. In this work, in order to enable CNNs to make full use of context information and improve its representation ability, we propose a residual attention function fusion method, which improves the representation ability of feature maps by fusing contextual feature map information of different scales, and then propose a spatial attention mechanism for global pixel point convolution response. This method compresses global pixels through convolution, weights the original feature map pixels, reduces noise interference, and improves the network’s ability to grasp global critical pixel information. In experiments, the remote sensing ship image recognition experiments on remote sensing image data sets show that the network structure can improve the performance of small-target detection. The results on cifar10 and cifar100 prove that the attention mechanism is universal and practical.