This study aimed to identify biomarkers of major depressive disorder (MDD), by relating neuroimage-derived measures to binary (MDD/control), ordinal (severe MDD/mild MDD/control), or continuous (depression severity) outcomes. To address MDD heterogeneity, factors (severity of psychic depression, motivation, anxiety, psychosis, and sleep disturbance) were also used as outcomes. A multisite, multimodal imaging (diffusion MRI [dMRI] and structural MRI [sMRI]) cohort (52 controls and 147 MDD patients) and several modeling techniques-penalized logistic regression, random forest, and support vector machine (SVM)-were used. An additional cohort (25 controls and 83 MDD patients) was used for validation. The optimally performing classifier (SVM) had a 26.0% misclassification rate (binary), 52.2 ± 1.69% accuracy (ordinal) and r = .36 correlation coefficient (p < .001, continuous). Using SVM, R values for prediction of any MDD factors were <10%. Binary classification in the external data set resulted in 87.95% sensitivity and 32.00% specificity. Though observed classification rates are too low for clinical utility, four image-based features contributed to accuracy across all models and analyses-two dMRI-based measures (average fractional anisotropy in the right cuneus and left insula) and two sMRI-based measures (asymmetry in the volume of the pars triangularis and the cerebellum) and may serve as a priori regions for future analyses. The poor accuracy of classification and predictive results found here reflects current equivocal findings and sheds light on challenges of using these modalities for MDD biomarker identification. Further, this study suggests a paradigm (e.g., multiple classifier evaluation with external validation) for future studies to avoid nongeneralizable results.