Spatio-Temporal Transformer Network for Video Restoration

Kim, Tae Hyun; Sajjadi, Mehdi S. M.; Hirsch, Michael; Schölkopf, Bernhard

doi:10.1007/978-3-030-01219-9_7

Cited by 129 publications

(59 citation statements)

References 49 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A typical use of the Transformer architecture in NLP is to encode the meaning of a word given the surrounding words, sentences, and paragraphs. Beyond NLP, other example uses of the Transformer architecture are found in music generation 43 , image generation 44 , image and video restoration [45][46][47][48][49] , game playing agents 50,51 , and drug discovery 52,53 . In this work, we explore how our attention-based architecture, CrabNet, performs in predicting materials properties relative to the common modeling techniques Roost, ElemNet, and random forest (RF) for regression-type problems.…”

Section: Introductionmentioning

confidence: 99%

Compositionally restricted attention-based network for materials property predictions

et al. 2021

View full text Add to dashboard Cite

In this paper, we demonstrate an application of the Transformer self-attention mechanism in the context of materials science. Our network, the Compositionally Restricted Attention-Based network (), explores the area of structure-agnostic materials property predictions when only a chemical formula is provided. Our results show that ’s performance matches or exceeds current best-practice methods on nearly all of 28 total benchmark datasets. We also demonstrate how ’s architecture lends itself towards model interpretability by showing different visualization approaches that are made possible by its design. We feel confident that and its attention-based framework will be of keen interest to future materials informatics researchers.

show abstract

Section: Introductionmentioning

confidence: 99%

Compositionally restricted attention-based network for materials property predictions

et al. 2021

View full text Add to dashboard Cite

show abstract

“…Over the past decade, an increasing number of works have focused on quality enhancement for compressed image (Foi, Katkovnik, and Egiazarian 2007;Jancsary, Nowozin, and Rother 2012;Chang, Ng, and Zeng 2013;Zhang et al 2013;Dong et al 2015;Guo and Chao 2016;Zhang et al 2017; (Caballero et al 2017). Since then, temporal fusion with motion compensation has been widely adopted for various vision tasks (Xue et al 2017;Yang et al 2018;Kim et al 2018;Guan et al 2019). However, these methods heavily rely on accurate optical flow which is hard to obtain due to general problems (e.g., occlusion, large motion) or task-specific problems (e.g., compression artifacts).…”

Section: Related Workmentioning

confidence: 99%

Spatio-Temporal Deformable Convolution for Compressed Video Quality Enhancement

Deng

Wang

et al. 2020

AAAI

121

108

View full text Add to dashboard Cite

Recent years have witnessed remarkable success of deep learning methods in quality enhancement for compressed video. To better explore temporal information, existing methods usually estimate optical flow for temporal motion compensation. However, since compressed video could be seriously distorted by various compression artifacts, the estimated optical flow tends to be inaccurate and unreliable, thereby resulting in ineffective quality enhancement. In addition, optical flow estimation for consecutive frames is generally conducted in a pairwise manner, which is computational expensive and inefficient. In this paper, we propose a fast yet effective method for compressed video quality enhancement by incorporating a novel Spatio-Temporal Deformable Fusion (STDF) scheme to aggregate temporal information. Specifically, the proposed STDF takes a target frame along with its neighboring reference frames as input to jointly predict an offset field to deform the spatio-temporal sampling positions of convolution. As a result, complementary information from both target and reference frames can be fused within a single Spatio-Temporal Deformable Convolution (STDC) operation. Extensive experiments show that our method achieves the state-of-the-art performance of compressed video quality enhancement in terms of both accuracy and efficiency.

show abstract

“…On the other hand, several approaches use additional prior information, such as depth [63] or semantic labels [112], to guide the deblurring process. In addition, 2D convolutions are also adopted by all video deblurring methods [2,41,53,83,85,116,120,132,138]. The main difference between single image and video deblurring are 3D convolutions, which can extract features from both spatial and temporal domains [151].…”

Section: Basic Layers and Blocksmentioning

confidence: 99%

Deep Image Deblurring: A Survey

Ren¹,

Luo²,

Lai³

et al. 2022

Preprint

View full text Add to dashboard Cite

Image deblurring is a classic problem in lowlevel computer vision, which aims to recover a sharp image from a blurred input image. Recent advances in deep learning have led to significant progress in solving this problem, and a large number of deblurring networks have been proposed. This paper presents a comprehensive and timely survey of recently published deep-learning based image deblurring approaches, aiming to serve the community as a useful literature review. We start by discussing common causes of image blur, introduce benchmark datasets and performance metrics, and summarize different problem formulations. Next we present a taxonomy of methods using convolutional neural networks (CNN) based on architecture, loss function, and application, offering a detailed review

show abstract

Spatio-Temporal Transformer Network for Video Restoration

Cited by 129 publications

References 49 publications

Compositionally restricted attention-based network for materials property predictions

Compositionally restricted attention-based network for materials property predictions

Spatio-Temporal Deformable Convolution for Compressed Video Quality Enhancement

Deep Image Deblurring: A Survey

Contact Info

Product

Resources

About