We propose an unmixing framework for enhancing endmember fraction maps using a combination of spectral and visible images. The new method, data fusion through spatial information-aided learning (DFuSIAL), is based on a learning process for the fusion of a multispectral image of low spatial resolution and a visible RGB image of high spatial resolution. Unlike commonly used methods, DFuSIAL allows for fusing data from different sensors. To achieve this objective, we apply a learning process using automatically extracted invariant points, which are assumed to have the same land cover type in both images. First, we estimate the fraction maps of a set of endmembers for the spectral image. Then, we train a spatial-features aided neural network (SFFAN) to learn the relationship between the fractions, the visible bands, and rotation-invariant spatial features for learning (RISFLs) that we extract from the RGB image. Our experiments show that the proposed DFuSIAL method obtains fraction maps with significantly enhanced spatial resolution and an average mean absolute error between 2% and 4% compared to the reference ground truth. Furthermore, it is shown that the proposed method is preferable to other examined state-of-the-art methods, especially when data is obtained from different instruments and in cases with missing-data pixels.