Pan-sharpening, which fuses the high resolution panchromatic (PAN) image and the low resolution multispectral image (MSI), is a hot topic in remote sensing. Recently, deep learning technology has been successfully applied in pansharpening. However, the existing methods ignore that the MSI and PAN image are at different resolution and use the same networks to extract features of the two images. To address this problem, we propose a two-stream deep learning architecture, called coupled multi-scale convolutional neural network (CMC), for pan-sharpening. The proposed network has 3 components, feature extraction subnetworks, fusion layer, and super-resolution subnetwork. In the feature extraction subnetworks, two subnetworks are used to extract the features of the MSI and PAN image, separately. Different sizes of convolutional kernels are used in the first layers due to the different spatial resolutions. Thus, the source images are mapped to the similar scale. Then a multi-scale asymmetric convolution factorization is used to extract features at different scales. In the fusion layer, the two feature extraction subnetworks are coupled. Features at the same scale are firstly summed and then the features of all scales are concatenated as one feature map. At last, a super-resolution subnetwork is used to generate the high resolution MSI. Experimental results on both synthetic and real data sets demonstrate that the proposed method outperforms the other state-of-the-art pan-sharpening methods.