Parkinson's disease (PD) is the second most common neurological disease having no specific medical test for its diagnosis. In this study, we consider PD detection based on multimodal voice data that was collected through two channels, i.e., Smart Phone (SP) and Acoustic Cardioid (AC). Four types of data modalities were collected through each channel, namely sustained phonation (P), speech (S), voiced (V), and unvoiced (U) modality. The contributions of this paper are twofold. First, it explores optimal data modality and features having better information about PD. Second, it proposes a MultiModal Data–Driven Ensemble (MMDD-Ensemble) approach for PD detection. The MMDD-Ensemble has two levels. At the first level, different base classifiers are developed that are driven by multimodal voice data. At the second level, the predictions of the base classifiers are fused using blending and voting methods. In order to validate the robustness of the propose method, six evaluation measures, namely accuracy, sensitivity, specificity, Matthews correlation coefficient (MCC), and area under the curve (AUC), are adopted. The proposed method outperformed the best results produced by optimal unimodal framework from both the key evaluation aspects, i.e., accuracy and AUC. Furthermore, the proposed method also outperformed other state-of-the-art ensemble models. Experimental results show that the proposed multimodal approach yields 96% accuracy, 100% sensitivity, 88.88% specificity, 0.914 of MCC, and 0.986 of AUC. These results are promising compared to the recently reported results for PD detection based on multimodal voice data.