Medical imaging and deep learning models are essential to the early identification and diagnosis of brain cancers, facilitating timely intervention and improving patient outcomes. This research paper investigates the integration of YOLOv5, a state-of-the-art object detection framework, with non-local neural networks (NLNNs) to improve brain tumor detection’s robustness and accuracy. This study begins by curating a comprehensive dataset comprising brain MRI scans from various sources. To facilitate effective fusion, the YOLOv5 and NLNNs, K-means+, and spatial pyramid pooling fast+ (SPPF+) modules are integrated within a unified framework. The brain tumor dataset is used to refine the YOLOv5 model through the application of transfer learning techniques, adapting it specifically to the task of tumor detection. The results indicate that the combination of YOLOv5 and other modules results in enhanced detection capabilities in comparison to the utilization of YOLOv5 exclusively, proving recall rates of 86% and 83% respectively. Moreover, the research explores the interpretability aspect of the combined model. By visualizing the attention maps generated by the NLNNs module, the regions of interest associated with tumor presence are highlighted, aiding in the understanding and validation of the decision-making procedure of the methodology. Additionally, the impact of hyperparameters, such as NLNNs kernel size, fusion strategy, and training data augmentation, is investigated to optimize the performance of the combined model.