2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022
DOI: 10.1109/cvpr52688.2022.01692
|View full text |Cite
|
Sign up to set email alerts
|

RSTT: Real-time Spatial Temporal Transformer for Space-Time Video Super-Resolution

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
23
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 64 publications
(23 citation statements)
references
References 40 publications
0
23
0
Order By: Relevance
“…The entire compression pipeline is depicted in Stage (I) of Figure 2. We remark that a pre-trained M 0 is not essential for the purpose of compression since problem (2) is sufficient for a one-shot training/pruning, but M 0 allows us to ensure the compressed model performs competitively.…”
Section: General Methodologymentioning
confidence: 99%
See 1 more Smart Citation
“…The entire compression pipeline is depicted in Stage (I) of Figure 2. We remark that a pre-trained M 0 is not essential for the purpose of compression since problem (2) is sufficient for a one-shot training/pruning, but M 0 allows us to ensure the compressed model performs competitively.…”
Section: General Methodologymentioning
confidence: 99%
“…V IDEO frame interpolation is a low-level computer vision task that involves creating interim (non-existent) frames between actual frames in a sequence to greatly improve the temporal resolution. It plays an important role in many applications, including frame rate up-conversion [1], [2], slow-motion generation [3], and novel view synthesis [4], [5]. Though fundamental, the problem is challenging in that the complex motion, occlusion and feature variation in real world videos are difficult to estimate and predict in a transparent way.…”
Section: Introductionmentioning
confidence: 99%
“…To interpolate an arbitrary number of intermediate frames for STVSR, Xu et al [31] proposed a temporal modulation network, which is achieved by temporal modulation block under the deformable convolution framework for controllable feature interpolation. Recently, Geng et al [32] proposed a single spatial temporal Transformer architecture that incorporates the temporal interpolation and spatial super-resolution modules for the STVSR task. To sufficiently aggregate the spatio-temporal information, Hu et al [33] proposed an efficient recurrent network with bidirectional interactive propagation module for STVSR task, where only one alignment and fusion are required.…”
Section: Two-stage and One-stage Space-time Video Super-resolutionmentioning
confidence: 99%
“…For the four super-resolution methods, Bicubic and RCAN [47] are single-image super-resolution methods, RBPN [15] and EDVR [21] are recent VSR methods. In addition, the one-stage STVSR methods used for comparison include Zooming SlowMo [30] and RSTT [32].…”
Section: Evaluation On Space-time Video Super-resolutionmentioning
confidence: 99%
“…Based on the swin transformer and convolution, a spatiotemporal model is constructed in this study to perform the bias correction task and the temporal downscaling task, as shown in Figure 2. The spatio-temporal model based on the swin transformer has been designed and applied to video superresolution tasks and action recognition tasks (Geng et al, 2022;Liu et al, 2022). The entire framework consists of an encoder, a decoder, and a query builder.…”
Section: Modelmentioning
confidence: 99%