Deep learning (DL) techniques are becoming more popular for diagnosing Parkinson’s disease (PD) because they offer non-invasive and easily accessible tools. By using advanced data analysis, these methods improve early detection and diagnosis, which is crucial for managing the disease effectively. This study explores end-to-end DL architectures, such as convolutional neural networks and transformers, for diagnosing PD using self-reported voice data collected via smartphones in everyday settings. Transfer learning was applied by starting with models pre-trained on large datasets from the image and the audio domains and then fine-tuning them on the mPower voice data. The Transformer model pre-trained on the voice data performed the best, achieving an average AUC of
and an average AUPRC of
, outperforming models trained from scratch. To the best of our knowledge, this is the first use of a Transformer model for audio data in PD diagnosis, using this dataset. We achieved better results than previous studies, whether they focused solely on the voice or incorporated multiple modalities, by relying only on the voice as a biomarker. These results show that using self-reported voice data with state-of-the-art DL architectures can significantly improve PD prediction and diagnosis, potentially leading to better patient outcomes.