“…While the prior SL literature focuses more on techniques such as Hidden Markov Models (HMMs) for sequence modeling after extracting handcrafted features, recent studies follow the idea of employing 2D-3D CNN and RNN-based architectures in which frames or skeleton joint information are directly used (Aran, 2008 ; Camgöz et al, 2016a ; Koller et al, 2016 , 2019 ; Zhang et al, 2016 ; Mittal et al, 2019 ; Abdullahi and Chamnongthai, 2022 ; Samaan et al, 2022 ). More recently, Transformer based architectures have become popular on SLR and Sign Language Translation (SLT) tasks due to their success in domains such as Natural Language Processing (NLP) and Speech Processing (SP) (Vaswani et al, 2017 ; Camgoz et al, 2020b ; Rastgoo et al, 2020 ; Boháček and Hrúz, 2022 ; Cao et al, 2022 ; Chen et al, 2022 ; Hrúz et al, 2022 ; Hu et al, 2022 ; Xie et al, 2023 ).…”