BACKGROUND: Alzheimer’s disease (AD) endangers the physical and mental health of the elderly, constituting one of the most crucial social challenges. Due to lack of effective AD intervention drugs, it is very important to diagnose AD in the early stage, especially in the Mild Cognitive Impairment (MCI) phase. OBJECTIVE: At present, an automatic classification technology is urgently needed to assist doctors in analyzing the status of the candidate patient. The artificial intelligence enhanced Alzheimer’s disease detection can reduce costs to detect Alzheimer’s disease. METHODS: In this paper, a novel pre-trained ensemble-based AD detection (PEADD) framework with three base learners (i.e., ResNet, VGG, and EfficientNet) for both the audio-based and PET (Positron Emission Tomography)-based AD detection is proposed under a unified image modality. Specifically, the effectiveness of context-enriched image modalities instead of the traditional speech modality (i.e., context-free audio matrix) for the audio-based AD detection, along with simple and efficient image denoising strategy has been inspected comprehensively. Meanwhile, the PET-based AD detection based on the denoised PET image has been described. Furthermore, different voting methods for applying an ensemble strategy (i.e., hard voting and soft voting) has been investigated in detail. RESULTS: The results showed that the classification accuracy was 92% and 99% on the audio-based and PET-based AD datasets, respectively. Our extensive experimental results demonstrate that our PEADD outperforms the state-of-the-art methods on both audio-based and PET-based AD datasets simultaneously. CONCLUSIONS: The network model can provide an objective basis for doctors to detect Alzheimer’s Disease.