Myocardial infarction (MI) remains a significant contributor to global mortality and morbidity, necessitating accurate and timely diagnosis. Current diagnostic methods encounter challenges in capturing intricate patterns, urging the need for advanced automated approaches to enhance MI detection. In this study, we strive to advance MI detection by proposing a hybrid approach that combines the strengths of ResNet and Vision Transformer (ViT) models, leveraging global and local features for improved accuracy. We introduce a slim-model ViT design with multibranch networks and channel attention mechanisms to enhance patch embedding extraction, addressing ViT’s limitations. By training data through both ResNet and modified ViT models, we incorporate a dual-pathway feature extraction strategy. The fusion of global and local features addresses the challenge of robust feature vector creation. Our approach showcases enhanced learning capabilities through modified ViT architecture and ResNet architecture. The dual-pathway training enriches feature extraction, culminating in a comprehensive feature vector. Preliminary results demonstrate significant potential for accurate detection of MI. Our study introduces a hybrid ResNet-ViT model for advanced MI detection, highlighting the synergy between global and local feature extraction. This approach holds promise for elevating MI classification accuracy, with implications for improved patient care. Further validation and clinical applicability exploration are warranted.