Tetanus is a life-threatening infectious disease, which is still common in low-and middle-income countries, including in Vietnam. This disease is characterized by muscle spasm and in severe cases is complicated by autonomic dysfunction. Ideally continuous vital sign monitoring using bedside monitors allows the prompt detection of the onset of autonomic nervous system dysfunction or avoiding rapid deterioration. Detection can be improved using heart rate variability analysis from ECG signals. Recently, characteristic ECG and heart rate variability features have been shown to be of value in classifying tetanus severity. However, conventional manual analysis of ECG is time-consuming. The traditional convolutional neural network (CNN) has limitations in extracting the global context information, due to its fixed-sized kernel filters. In this work, we propose a novel hybrid CNN-Transformer model to automatically classify tetanus severity using tetanus monitoring from low-cost wearable sensors. This model can capture the local features from the CNN and the global features from the Transformer. The time series imaging -spectrogram -is transformed from one-dimensional ECG signal and input to the proposed model. The CNN-Transformer model outperforms state-of-the-art methods in tetanus classification, achieves results with a F1 score of 0.82 ± 0.03, precision of 0.94 ± 0.03, recall of 0.73 ± 0.07, specificity of 0.97 ± 0.02, accuracy of 0.88 ± 0.01 and AUC of 0.85 ± 0.03. In addition, we found that Random Forest with enough manually selected features can be comparable with the proposed CNN-Transformer model.