Deep neural networks show great promise for classifying brain diseases and making prognostic assessments based on neuroimaging data, but large, labeled training datasets are often required to achieve high predictive accuracy. Here we evaluated a range of transfer learning or pre-training strategies to create useful MRI representations for downstream tasks that lack large amounts of training data, such as Alzheimer’s disease (AD) classification. To test our proposed pretraining strategies, we analyzed 4,098 3D T1-weighted brain MRI scans from the Alzheimer's Disease Neuroimaging Initiative (ADNI) cohort and independently validated with an out-of-distribution test set of 600 scans from the Open Access Series of Imaging Studies (OASIS3) cohort for detecting AD. First, we trained 3D and 2D convolutional neural network (CNN) architectures. We tested combinations of multiple pre-training strategies based on (1) supervised, (2) contrastive learning, and (3) self-supervised learning - using pre-training data within versus outside the MRI domain. In our experiments, the 3D CNN pre-trained with contrastive learning provided the best overall results - when fine-tuned on T1-weighted scans for AD classification - outperformed the baseline by 2.8% when trained with all of the training data from ADNI. We also show test performance as a function of the training dataset size and the chosen pre-training method. Transfer learning offered significant benefits in low data regimes, with a performance boost of 7.7%. When the pretrained model was used for AD classification, we were able to visualize an improved clustering of test subjects' diagnostic groups, as illustrated via a uniform manifold approximation (UMAP) projection of the high-dimensional model embedding space. Further, saliency maps indicate the additional activation regions in the brain scan using pretraining, that then maximally contributed towards the final prediction score.