Early detection of Alzheimer’s disease (AD), such as predicting development from mild cognitive impairment (MCI) to AD, is critical for slowing disease progression and increasing quality of life. Although deep learning is a promising technique for structural MRI-based diagnosis, the paucity of training samples limits its power, especially for three-dimensional (3D) models. To this end, we propose a two-stage model combining both transfer learning and contrastive learning that can achieve high accuracy of MRI-based early AD diagnosis even when the sample numbers are restricted. Specifically, a 3D CNN model was pretrained using publicly available medical image data to learn common medical features, and contrastive learning was further utilized to learn more specific features of MCI images. The two-stage model outperformed each benchmark method. Compared with the previous studies, we show that our model achieves superior performance in progressive MCI patients with an accuracy of 0.82 and AUC of 0.84. We further enhance the interpretability of the model by using 3D Grad-CAM, which highlights brain regions with high-predictive weights. Brain regions, including the hippocampus, temporal, and precuneus, are associated with the classification of MCI, which is supported by the various types of literature. Our model provides a novel model to avoid overfitting because of a lack of medical data and enable the early detection of AD.