Alzheimer's disease (AD) is a progressive neurological disorder characterized by memory loss and cognitive decline, affecting millions worldwide. Early detection is crucial for effective treatment, as it can slow disease progression and improve quality of life. Machine learning has shown promise in AD detection using various medical modalities. In this paper, we propose a novel multi-level stacking model that combines heterogeneous models and modalities to predict different classes of AD. The modalities include cognitive sub-scores (e.g., clinical dementia rating -sum of boxes, Alzheimer's disease assessment scale) from the Alzheimer's Disease Neuroimaging Initiative dataset. In the proposed approach, in level 1, we used six base models (Random Forest (RF), Decision Tree (DT), Support Vector Machine (SVM), Logistic Regression (LR), K-nearest Neighbors (KNN), and Native Bayes (NB)to train each modality (ADAS, CDR, and FQA). Then, we build stacking training that combines the outputs of each base model for the training set and staking testing that combines the outcomes of each model for the testing set. In level 2, three stacking models are produced for each modality that trains and evaluates based on the output of 6 base models based on (RF, LR, DT, SVM, KNN, and NB) are combined in training stacking for the training set and testing stacking for the testing set. Stacking training is used to train meta-learners (RF), and stacking testing is used to evaluate meta-learners (RF). Finally, in level 3, the output prediction of the stacking model from each modality (ADAS, CDR, and FQA) in the training and testing datasets is merged to build a new dataset, which is staking training and stacking testing. Training stacking is used to train the meta-learner, and the testing set is used to evaluate the meta-learner and produce the final prediction. Our research also aims to provide model explanations, ensuring efficiency, effectiveness, and trust through explainable artificial intelligence (XAI). Feature selection optimization based on Particle Swarm Optimization is used to select the most appropriate sub-scores. The proposed model shows significant potential for improving early disease diagnosis. The results demonstrate that the multi-modality approach outperforms single-modality approaches. Moreover, the proposed multi-level stacking models achieve the highest performance with selected features compared to regular ML classifiers and stacking models using full multi-modalities, achieving accuracy, precision, recall, and F1-scores of 92.08%, 92.07%, 92.08%, and 92.01% for two classes, and 90.03%, 90.19%, 90.03%, and 90.05% for three classes, respectively.