2018
DOI: 10.1007/978-3-030-01219-9_7
|View full text |Cite
|
Sign up to set email alerts
|

Spatio-Temporal Transformer Network for Video Restoration

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
59
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
3
1
1

Relationship

0
10

Authors

Journals

citations
Cited by 129 publications
(59 citation statements)
references
References 49 publications
0
59
0
Order By: Relevance
“…A typical use of the Transformer architecture in NLP is to encode the meaning of a word given the surrounding words, sentences, and paragraphs. Beyond NLP, other example uses of the Transformer architecture are found in music generation 43 , image generation 44 , image and video restoration [45][46][47][48][49] , game playing agents 50,51 , and drug discovery 52,53 . In this work, we explore how our attention-based architecture, CrabNet, performs in predicting materials properties relative to the common modeling techniques Roost, ElemNet, and random forest (RF) for regression-type problems.…”
Section: Introductionmentioning
confidence: 99%
“…A typical use of the Transformer architecture in NLP is to encode the meaning of a word given the surrounding words, sentences, and paragraphs. Beyond NLP, other example uses of the Transformer architecture are found in music generation 43 , image generation 44 , image and video restoration [45][46][47][48][49] , game playing agents 50,51 , and drug discovery 52,53 . In this work, we explore how our attention-based architecture, CrabNet, performs in predicting materials properties relative to the common modeling techniques Roost, ElemNet, and random forest (RF) for regression-type problems.…”
Section: Introductionmentioning
confidence: 99%
“…Over the past decade, an increasing number of works have focused on quality enhancement for compressed image (Foi, Katkovnik, and Egiazarian 2007;Jancsary, Nowozin, and Rother 2012;Chang, Ng, and Zeng 2013;Zhang et al 2013;Dong et al 2015;Guo and Chao 2016;Zhang et al 2017; (Caballero et al 2017). Since then, temporal fusion with motion compensation has been widely adopted for various vision tasks (Xue et al 2017;Yang et al 2018;Kim et al 2018;Guan et al 2019). However, these methods heavily rely on accurate optical flow which is hard to obtain due to general problems (e.g., occlusion, large motion) or task-specific problems (e.g., compression artifacts).…”
Section: Related Workmentioning
confidence: 99%
“…On the other hand, several approaches use additional prior information, such as depth [63] or semantic labels [112], to guide the deblurring process. In addition, 2D convolutions are also adopted by all video deblurring methods [2,41,53,83,85,116,120,132,138]. The main difference between single image and video deblurring are 3D convolutions, which can extract features from both spatial and temporal domains [151].…”
Section: Basic Layers and Blocksmentioning
confidence: 99%