The current research on multiple information fusion of remote sensing images is mainly aimed at remote sensing images of specific satellite sensors, and cannot be extended to other types of data source images. For high-resolution remote sensing images, when its surface coverage changes significantly, most of the mainstream algorithms are difficult to restore satisfactorily. The algorithm proposed in this paper combines the sparse representation and the spectral, spatial, and temporal features of remote sensing images for the first time to solve the above problems. The algorithm proposed in this paper first simulates the human visual mechanism, and obtains the spatial, spectral, and temporal features of the remote sensing image through the spatial spectral dictionary learning and the time-varying weight learning model. Secondly, local constraints are added to the extraction of temporal features to obtain temporal and geographical change information of heterogeneous remote sensing images. Then, a sparse representation model combining space-spectrum-time features is proposed to extract features of high-resolution remote sensing images. Finally, based on the VGG-16 network, this paper proposes a target recognition network with deep fully convolutional network, and uses the extracted feature map as the input of the target recognition network to realize the target recognition of the remote sensing image. Experimental results show that the method proposed in this paper can improve the accuracy of target recognition and improve the accuracy of recognition.