The timely diagnosis of Alzheimer’s disease (AD) and its prodromal stages is critically important for the patients, who manifest different neurodegenerative severity and progression risks, to take intervention and early symptomatic treatments before the brain damage is shaped. As one of the promising techniques, functional near-infrared spectroscopy (fNIRS) has been widely employed to support early-stage AD diagnosis. This study aims to validate the capability of fNIRS coupled with Deep Learning (DL) models for AD multi-class classification. First, a comprehensive experimental design, including the resting, cognitive, memory, and verbal tasks was conducted. Second, to precisely evaluate the AD progression, we thoroughly examined the change of hemodynamic responses measured in the prefrontal cortex among four subject groups and among genders. Then, we adopted a set of DL architectures on an extremely imbalanced fNIRS dataset. The results indicated that the statistical difference between subject groups did exist during memory and verbal tasks. This presented the correlation of the level of hemoglobin activation and the degree of AD severity. There was also a gender effect on the hemoglobin changes due to the functional stimulation in our study. Moreover, we demonstrated the potential of distinguished DL models, which boosted the multi-class classification performance. The highest accuracy was achieved by Convolutional Neural Network-Long Short-Term Memory (CNN-LSTM) using the original dataset of three hemoglobin types (0.909 ± 0.012 on average). Compared to conventional machine learning algorithms, DL models produced a better classification performance. These findings demonstrated the capability of DL frameworks on the imbalanced class distribution analysis and validated the great potential of fNIRS-based approaches to be further contributed to the development of AD diagnosis systems.