Multi-spectral imaging (MSI) is a novel non-invasive tool for visualizing the entire span of the eye, from the internal limiting membrane to the choroid. However, spatial misalignments can be frequently observed in sequential MSI images because the eye saccade movement is usually faster than the MSI image acquisition speed. Therefore, registering MSI images is necessary for computer-based analysis of retinal degeneration via MSI. In this paper, we propose an early deep learning framework for achieving an accurate registration of MSI images in a group-wise fashion. The framework contains three parts: a template construction based on principal component analysis, a deformation field calculation, and a spatial transformation. The framework is uniquely capable of resolving two key challenges, i.e., the ''multimodal'' characteristics in MSI images for the acquisition with different spectra and the requirement of joint registration of the sequential images. Our experimental results demonstrate the superior performance of our framework compared to several representative state-of-the-art techniques in both speed and accuracy. INDEX TERMS Multi-spectral images, group-wise registration, deep learning, mono/multi-modal images.