Background/Objectives: Breast cancer is a leading cause of mortality among women in Taiwan and globally. Non-invasive imaging methods, such as mammography and ultrasound, are critical for early detection, yet standalone modalities have limitations in regard to their diagnostic accuracy. This study aims to enhance breast cancer detection through a cross-modality fusion approach combining mammography and ultrasound imaging, using advanced convolutional neural network (CNN) architectures. Materials and Methods: Breast images were sourced from public datasets, including the RSNA, the PAS, and Kaggle, and categorized into malignant and benign groups. Data augmentation techniques were used to address imbalances in the ultrasound dataset. Three models were developed: (1) pre-trained CNNs integrated with machine learning classifiers, (2) transfer learning-based CNNs, and (3) a custom-designed 17-layer CNN for direct classification. The performance of the models was evaluated using metrics such as accuracy and the Kappa score. Results: The custom 17-layer CNN outperformed the other models, achieving an accuracy of 0.964 and a Kappa score of 0.927. The transfer learning model achieved moderate performance (accuracy 0.846, Kappa 0.694), while the pre-trained CNNs with machine learning classifiers yielded the lowest results (accuracy 0.780, Kappa 0.559). Cross-modality fusion proved effective in leveraging the complementary strengths of mammography and ultrasound imaging. Conclusions: This study demonstrates the potential of cross-modality imaging and tailored CNN architectures to significantly improve diagnostic accuracy and reliability in breast cancer detection. The custom-designed model offers a practical solution for early detection, potentially reducing false positives and false negatives, and improving patient outcomes through timely and accurate diagnosis.