The advancement in deep learning techniques has helped researchers acquire and process multimodal data signals from different healthcare domains. Now, the focus has shifted towards providing end-to-end solutions, i.e., processing these data and developing models that can be directly implemented on edge devices. To achieve this, the researchers try to solve two problems: (I) reduce the complex feature dependencies and (II) reduce the complexity of the deep learning model without compromising accuracy. In this paper, we focus on the later part of reducing the complexity of the model by using the knowledge distillation framework. We have introduced knowledge distillation on the Vision Transformer model to study the MIT-BIH Arrhythmia Database. A tenfold crossvalidation technique was used to validate the model, and we obtained a 99.7% F1 score and 99.3% accuracy. The model was further tested on the Xilinx Alveo U50 FPGA accelerator, and it is found fit for any low-powered wearable device implementation.