Pansharpening is normally utilized to take full advantage of all the available spectral and spatial information that are derived from a low-spatial-resolution (LR) multispectral (MS) image and its associated high-spatial-resolution (HR) panchromatic (PAN) image, respectively, producing a fused MS image with high spectral and spatial resolutions. Many methods have been recently developed based on convolutional neural networks (CNNs) for the pansharpening task, but most of them still have some drawbacks: 1) The information cannot efficiently flow in their simple stacked convolutional architectures, thereby hindering the representation ability of the networks; 2) They are commonly trained using supervised learning, which does not only require an extra effort to produce the simulated training data, but can also lead to scale-related problems in the fusion results. In this paper, we propose a novel unsupervised CNN-based pansharpening method to overcome these limitations. Specifically, we design an iterative network architecture, in which a PANguided strategy and a set of skip connections (SC) are adopted to continuously extract and fuse the features from the input, thus enhancing the information reuse and transmission. Besides, we propose a new loss function for unsupervised training in which the relationships between the input MS and PAN images and the fused MS image are used to design the spatial constrains and spectral consistency, respectively. The typical quality index with noreference (QNR) is also added to this function to further adjust the spectral and spatial qualities. The designed loss function allows the network to be learned only on input images, without any handcrafted labels (reference HR MS image). We evaluated the effectiveness of our designed network architecture and the combined loss function, and the experiments testify that our unsupervised strategy can also obtain promising results with minor spectral and spatial distortions compared with other traditional and supervised methods.