Deep learning-based image super-resolution has shown significantly good performance in improving image quality. In this paper, the RGB-IR cross input and sub-pixel upsampling network is proposed to increase the spatial resolution of an Infrared (IR) image by combining it with a color image of higher spatial resolution obtained with a different imaging modality. Specifically, this is accomplished by fusion of the features map of two RGB-IR inputs in the reconstruction of an infrared image. To improve the accuracy of feature extraction, deconvolution is replaced by sub-pixel convolution to upsample image in the network. Then, the guided filter layer is introduced for image denoising of IR images, and it can preserve the image detail. In addition, the experimental dataset, which is collected by us, contains large numbers of RGB images and corresponding IR images with the same scene. Experimental results on our dataset and other datasets demonstrate that the method is superior to existing methods in accuracy and visual improvement.