“…Action recognition using different modalities (Sun et al, 2022;Özyer et al, 2021) (e.g., video, skeleton) (Li et al, 2023;Song et al, 2021;Yang et al, 2021;Hu et al, 2020;Zhang et al, 2020b;Wang et al, 2018) has been widely studied due to its wide use in many potential applications such as autonomous driving and video surveillance. Compared with the conventional RGB video, 3D skeleton owning high-level representation is light-weight and robust to both view differences and complicated background.…”