“…Rest of the methods that include multi-task hierarchical clustering [57], BT-LSTM [58], deep autoencoder [59], two-stream attention LSTM [60], weighted entropy-variance based feature selection [61], dilated CNN+BiLSTM+RB [62], DS-GRU [43], and local-global features + QSVM [63] obtain 89.7%, 85.3%, 96.2%, 96.9%, 94.5%, 89.0%, 97.1%, and 82.6% accuracies, respectively. For the UCF50 dataset, the proposed method dominates the state-of-the-art methods by obtaining the best accuracy of 97.5%, whereas the (LD-BF) + (LD-DF) [64] obtains the second-based accuracy of 96.7%. The local-global features + QSVM [63] achieves the lowest accuracy of 69.4%, whereas the rest of the methods including multi-task hierarchical clustering [57], deep autoencoder [59], ensemble model with sward-based optimization [65], and DS-GRU [43] obtain [57] 2017 89.7 BT-LSTM [58] 2018 85.3 Deep autoencoder [59] 2019 96.2 STDN [56] 2020 98.2 Two-stream attention LSTM [60] 2020 96.9 Weighted entropy-variances based feature selection [61] 2021 94.5 Dilated CNN+BiLSTM+RB [62] 2021 89.0 DS-GRU [43] 2021 97.1 Local-global features + QSVM [63] 2021 82.6 DA-CNN+Bi-GRU (Proposed) 2022 98.0 Finally, for the HMDB51 dataset comprising of challenging action videos, our proposed method achieves the best results by obtaining an accuracy of 79.3%, whereas the runnerup method is evidential deep learning [66] that attains an accuracy of 77.0%.…”