Brain tumor, a leading cause of uncontrolled cell growth in the central nervous system, presents substantial challenges in medical diagnosis and treatment. Early and accurate detection is essential for effective intervention. This study aims to enhance the detection and classification of brain tumors in Magnetic Resonance Imaging (MRI) scans using an innovative framework combining Vision Transformer (ViT) and Gated Recurrent Unit (GRU) models. We utilized primary MRI data from Bangabandhu Sheikh Mujib Medical College Hospital (BSMMCH) in Faridpur, Bangladesh. Our hybrid ViT-GRU model extracts essential features via ViT and identifies relationships between these features using GRU, addressing class imbalance and outperforming existing diagnostic methods. We extensively processed the dataset, and then trained the model using various optimizers (SGD, Adam, AdamW) and evaluated through rigorous 10-fold cross-validation. Additionally, we incorporated Explainable Artificial Intelligence (XAI) techniques-Attention Map, SHAP, and LIME-to enhance the interpretability of the model’s predictions. For the primary dataset BrTMHD-2023, the ViT-GRU model achieved precision, recall, and F1-score metrics of 97%. The highest accuracies obtained with SGD, Adam, and AdamW optimizers were 81.66%, 96.56%, and 98.97%, respectively. Our model outperformed existing Transfer Learning models by 1.26%, as validated through comparative analysis and cross-validation. The proposed model also shows excellent performances with another Brain Tumor Kaggle Dataset outperforming the existing research done on the same dataset with 96.08% accuracy. The proposed ViT-GRU framework significantly improves the detection and classification of brain tumors in MRI scans. The integration of XAI techniques enhances the model’s transparency and reliability, fostering trust among clinicians and facilitating clinical application. Future work will expand the dataset and apply findings to real-time diagnostic devices, advancing the field.