“…The proposed approach is compared against the top-scoring approaches of the literature on the three employed datasets, specifically, TBN [44], BAT [16], MARS [62], Fast-S3D [38], RMS [64], CGNL [30], ATFR [72], Ada3D [17], TCPNet [45], LgNet [68], ST-VLAD [50], PivotCorrNN [53], LiteEval [57], AdaFrame [54], Listen to Look [56], SCSampler [73], AR-Net [7], SMART [59], ObjectGraphs [5], MARL [55], FrameExit [6] and AdaFocusV2 [19] (note that not all of these works report results for all the datasets mAP(%) AdaFrame [54] 71.5 Listen to Look [56] 72.3 LiteEval [57] 72.7 SCSampler [73] 72.9 AR-Net [7] 73.8 FrameExit [6] 77.3 AdaFocusV2 [19] 79.0 AR-Net (EfficientNet backbone) [7] 79.7 MARL (ResNet backbone on Kinetics) [55] 82.9 FrameExit (X3D-S backbone) [6] 87 used in the present work). The reported results on FCVID, MiniKinetics and ActivityNet are shown in Tables 1, 2 and 3, respectively.…”