PURPOSE:Deep learning methods for classifying prostate cancer (PCa) in ultrasound images typically employ convolutional neural networks (CNN) to detect cancer in small regions of interest (ROI) along a needle trace region. However, this approach suffers from weak labelling, since the ground-truth histopathology labels do not describe the properties of individual ROIs. Recently, multi-scale approaches have sought to mitigate this issue by combining the context awareness of transformers with a convolutional feature extractor to detect cancer from multiple ROIs using multiple-instance learning (MIL). In this work, we present a detailed study of several image transformer architectures for both ROI-scale and multi-scale classification, and a comparison of the performance of CNNs and transformers for ultrasound-based prostate cancer classification. We also design a novel multi-objective learning strategy that combines both ROI and core predictions to further mitigate label noise. METHODS: We use a dataset of 6607 prostate biopsy cores extracted from 693 patients at 5 distinct clinical centers. We evaluate 3 image transformers on ROI-scale cancer classification, then use the strongest model to tune a multi-scale classifier with MIL, wherein another transformer is fine-tuned on top of the existing model's features. We train our MIL models using our novel multi-objective learning strategy and compare our results to existing baselines. RESULTS: We find that for both ROI-scale and multi-scale PCa detection, image transformer backbones lag behind their CNN counterparts. This deficit in performance is even more noticeable for larger models. When using multi-objective learning, we are able to improve the performance of MIL models, with a 77.9% AUROC, a sensitivity of 75.9%, and a specificity of 66.3%, a considerable improvement over the baseline. CONCLUSION: We conclude that convolutional networks are better suited for modelling sparse datasets of prostate ultrasounds, producing more robust features than their transformer counterparts in PCa detection. Multi-scale methods remain the best architecture for this task, with multi-objective learning presenting an effective way to improve performance.