Hyperspectral image (HSI) super-resolution (SR) is a challenging task due to its ill-posed nature, and has attracted extensive attention by the research community. Previous methods concentrated on leveraging various hand-crafted image priors of a latent high-resolution hyperspectral (HR-HS) image to regularize the degradation model of the observed low-resolution hyperspectral (LR-HS) and HR-RGB images. Different optimization strategies for searching a plausible solution, which usually leads to a limited reconstruction performance, were also exploited. Recently, deep-learning-based methods evolved for automatically learning the abundant image priors in a latent HR-HS image. These methods have made great progress for HS image super resolution. Current deep-learning methods have faced difficulties in designing more complicated and deeper neural network architectures for boosting the performance. They also require large-scale training triplets, such as the LR-HS, HR-RGB, and their corresponding HR-HS images for neural network training. These training triplets significantly limit their applicability to real scenarios. In this work, a deep unsupervised fusion-learning framework for generating a latent HR-HS image using only the observed LR-HS and HR-RGB images without previous preparation of any other training triplets is proposed. Based on the fact that a convolutional neural network architecture is capable of capturing a large number of low-level statistics (priors) of images, the automatic learning of underlying priors of spatial structures and spectral attributes in a latent HR-HS image using only its corresponding degraded observations is promoted. Specifically, the parameter space of a generative neural network used for learning the required HR-HS image to minimize the reconstruction errors of the observations using mathematical relations between data is investigated. Moreover, special convolutional layers for approximating the degradation operations between observations and the latent HR-HS image are specifically to construct an end-to-end unsupervised learning framework for HS image super-resolution. Experiments on two benchmark HS datasets, including the CAVE and Harvard, demonstrate that the proposed method can is capable of producing very promising results, even under a large upscaling factor. Furthermore, it can outperform other unsupervised state-of-the-art methods by a large margin, and manifests its superiority and efficiency.