Tetanus is a life-threatening bacterial infection that is often prevalent in low- and middle-income countries (LMIC), Vietnam included. Tetanus affects the nervous system, leading to muscle stiffness and spasms. Moreover, severe tetanus is associated with autonomic nervous system (ANS) dysfunction. To ensure early detection and effective management of ANS dysfunction, patients require continuous monitoring of vital signs using bedside monitors. Wearable electrocardiogram (ECG) sensors offer a more cost-effective and user-friendly alternative to bedside monitors. Machine learning-based ECG analysis can be a valuable resource for classifying tetanus severity; however, using existing ECG signal analysis is excessively time-consuming. Due to the fixed-sized kernel filters used in traditional convolutional neural networks (CNNs), they are limited in their ability to capture global context information. In this work, we propose a 2D-WinSpatt-Net, which is a novel Vision Transformer that contains both local spatial window self-attention and global spatial self-attention mechanisms. The 2D-WinSpatt-Net boosts the classification of tetanus severity in intensive-care settings for LMIC using wearable ECG sensors. The time series imaging—continuous wavelet transforms—is transformed from a one-dimensional ECG signal and input to the proposed 2D-WinSpatt-Net. In the classification of tetanus severity levels, 2D-WinSpatt-Net surpasses state-of-the-art methods in terms of performance and accuracy. It achieves remarkable results with an F1 score of 0.88 ± 0.00, precision of 0.92 ± 0.02, recall of 0.85 ± 0.01, specificity of 0.96 ± 0.01, accuracy of 0.93 ± 0.02 and AUC of 0.90 ± 0.00.