Malware classification is a critical task in cybersecurity, as it offers insights into the threats that malware poses to the victim device and helps in the design of countermeasures. For realtime malware classification, due to the large amount of potential malware present in the network, there is a challenge of achieving high classification accuracy while maintaining low inference latency. We first introduce two self-attention transformer-based classifiers, SeqConvAttn and ImgConvAttn, to replace the currently predominant CNN-based classifiers. We then devise a file-size-aware two-stage framework to combine the two proposed models, thereby controlling the tradeoff between accuracy and latency for realtime classification. To assess our proposed designs, we conduct experiments on three malware datasets, the Microsoft Malware Classification Challenge (BIG 2015) and two selected subsets from the BODMAS PE malware dataset, BODMAS-11 and BODMAS-49. We show that our transformer-based designs can achieve better classification accuracy than traditional CNN-based designs. Furthermore, we show that the proposed two-stage framework can reduce the average model inference latency while maintaining superior accuracy, thereby fulfilling the requirements of real-time classification.