2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022
DOI: 10.1109/cvpr52688.2022.01943
|View full text |Cite
|
Sign up to set email alerts
|

AdaFocus V2: End-to-End Training of Spatial Dynamic Networks for Video Recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
14
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 40 publications
(16 citation statements)
references
References 56 publications
0
14
0
Order By: Relevance
“…ii) Concerning our ViGAT variant that utilizes a ResNet backbone pretrained on ImageNet, this outperforms the bestperforming literature approaches that similarly use a ResNet backbone in FCVID and ActivityNet (see Tables 1 and 3). Specifically, we observe a significant performance gain of 1% over AdaFocusV2 [19], which is the previous state-of-theart method. We also see that ViGAT provides a performance improvement of 1.4% over ObjectGraphs [5], which is the best previous bottom-up method.…”
Section: Event Recognition Resultsmentioning
confidence: 72%
See 4 more Smart Citations
“…ii) Concerning our ViGAT variant that utilizes a ResNet backbone pretrained on ImageNet, this outperforms the bestperforming literature approaches that similarly use a ResNet backbone in FCVID and ActivityNet (see Tables 1 and 3). Specifically, we observe a significant performance gain of 1% over AdaFocusV2 [19], which is the previous state-of-theart method. We also see that ViGAT provides a performance improvement of 1.4% over ObjectGraphs [5], which is the best previous bottom-up method.…”
Section: Event Recognition Resultsmentioning
confidence: 72%
“…The proposed approach is compared against the top-scoring approaches of the literature on the three employed datasets, specifically, TBN [44], BAT [16], MARS [62], Fast-S3D [38], RMS [64], CGNL [30], ATFR [72], Ada3D [17], TCPNet [45], LgNet [68], ST-VLAD [50], PivotCorrNN [53], LiteEval [57], AdaFrame [54], Listen to Look [56], SCSampler [73], AR-Net [7], SMART [59], ObjectGraphs [5], MARL [55], FrameExit [6] and AdaFocusV2 [19] (note that not all of these works report results for all the datasets mAP(%) AdaFrame [54] 71.5 Listen to Look [56] 72.3 LiteEval [57] 72.7 SCSampler [73] 72.9 AR-Net [7] 73.8 FrameExit [6] 77.3 AdaFocusV2 [19] 79.0 AR-Net (EfficientNet backbone) [7] 79.7 MARL (ResNet backbone on Kinetics) [55] 82.9 FrameExit (X3D-S backbone) [6] 87 used in the present work). The reported results on FCVID, MiniKinetics and ActivityNet are shown in Tables 1, 2 and 3, respectively.…”
Section: Event Recognition Resultsmentioning
confidence: 99%
See 3 more Smart Citations