2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021
DOI: 10.1109/cvpr46437.2021.01618
|View full text |Cite
|
Sign up to set email alerts
|

Progressive Temporal Feature Alignment Network for Video Inpainting

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
36
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 48 publications
(36 citation statements)
references
References 27 publications
0
36
0
Order By: Relevance
“…There are a few recent methods than build on STTN to generate higher resolution inpainted textures. These include introducing an Aggregated Contextual-Transformation GAN (AOT-GAN) Zeng et al (2021), combining 3D CNNs with a temporal shift and align module Zou et al (2021), and introducing a Deformable Alignment and Pyramid Context Completion Network with temporal attention Wu et al (2021). Addition-ally, more complex occlusions can be handled with a Decoupled Spatial-Temporal Transformer with a hierarchical encoder Liu et al (2021).…”
Section: Temporal Video Inpaintingmentioning
confidence: 99%
“…There are a few recent methods than build on STTN to generate higher resolution inpainted textures. These include introducing an Aggregated Contextual-Transformation GAN (AOT-GAN) Zeng et al (2021), combining 3D CNNs with a temporal shift and align module Zou et al (2021), and introducing a Deformable Alignment and Pyramid Context Completion Network with temporal attention Wu et al (2021). Addition-ally, more complex occlusions can be handled with a Decoupled Spatial-Temporal Transformer with a hierarchical encoder Liu et al (2021).…”
Section: Temporal Video Inpaintingmentioning
confidence: 99%
“…Some methods [7,23,28,50] employing 3D convolution and attention usually yield temporally inconsistent results due to the limited temporal receptive fields. To generate more temporal coherence results, many works [23,67] regard optical flows as strong priors for video inpainting and incorporate them into the network. However, directly computing optical flows between images within invalid regions is extremely difficult as these regions themselves become occlusion factors, restricting the performance.…”
Section: Related Workmentioning
confidence: 99%
“…To address these flaws, in this paper, we carefully design three trainable modules, including (1) flow completion, (2) feature propagation, and (3) content hallucination modules which simulate corresponding stages in flow-based methods and further constitute an End-to-End framework for Flow-Guided Video Inpainting (E 2 FGVI). Such close collaboration between the three modules alleviates the excessive dependence of intermediate results in the previously independently developed system [17,23,26,57,67] and works in a more efficient manner.…”
Section: Introductionmentioning
confidence: 99%
“…In general, direct synthesis methods adopt convolution-based [3,4,7,23] and attention-based [13,22] networks. They usually take corrupted frames and corresponding pixel-wise masks as input and directly output completed frames.…”
Section: Introductionmentioning
confidence: 99%
“…However, due to the computational complexity, these methods have relatively small temporal windows resulting in only a few reference frames to be used (up to 10 frames). Serious temporal inconsistencies can occur if some key frames The rear camel is originally shown in all frames, but it disappears in the result of [22,23] due to the small temporal window. (b) Wrong pixels are propagated, and the brightness inconsistency issue occurred in [6,20].…”
Section: Introductionmentioning
confidence: 99%