2020
DOI: 10.48550/arxiv.2012.00364
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Pre-Trained Image Processing Transformer

Abstract: As the computing power of modern hardware is increasing strongly, pre-trained deep learning models (e.g., BERT, GPT-3) learned on large-scale datasets have shown their effectiveness over conventional methods. The big progress is mainly contributed to the representation ability of transformer and its variant architectures. In this paper, we study the low-level computer vision task (e.g., denoising, super-resolution and deraining) and develop a new pretrained model, namely, image processing transformer (IPT). To… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
100
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 83 publications
(100 citation statements)
references
References 84 publications
0
100
0
Order By: Relevance
“…Inspired by a series of recent vision transformer (ViT) works [6,7,10,50], we determine to use the ViT architecture which has two advantages for the body reconstruction refinement task. Firstly, ViT follows a sequence prediction format by regarding the input image as a sequence of different local patches.…”
Section: Mesh Refinement Transformermentioning
confidence: 99%
See 1 more Smart Citation
“…Inspired by a series of recent vision transformer (ViT) works [6,7,10,50], we determine to use the ViT architecture which has two advantages for the body reconstruction refinement task. Firstly, ViT follows a sequence prediction format by regarding the input image as a sequence of different local patches.…”
Section: Mesh Refinement Transformermentioning
confidence: 99%
“…To construct our transformer network, we first use a backbone network (e.g. resnet) to extract image feature [7]. Then three deconvolution layers are added to the top layer of backbone to upsample the feature map and recover more spatial information.…”
Section: Mesh Refinement Transformermentioning
confidence: 99%
“…Since the results are not particularly satisfactory, its follow-up UP-DETR (Dai et al 2020) puts forward a random query patch detection method and boosts the performance of DETR with faster convergence and higher precision. IPT (Chen et al 2020a) generates corrupted image pairs from ImageNet (Deng et al 2009) and pretrains transformer on them. By fine-tuning the model in low-level CV tasks such as denoising, super-resolution and deraining, IPT outperforms contemporaneous approaches.…”
Section: Related Workmentioning
confidence: 99%
“…Among them, Pre-training and Meta-learning are two representative technologies, which have been also explored in image restoration. For instance, IPT [3] introduces a large-scale pre-training dataset to improve the restoration performance w.r.t the target distortion. Soh et al [31] propose a meta-learning based method to implement the fast adaptation for zero-shot super-resolution task, achieving a SOTA performance.…”
Section: Corresponding Authormentioning
confidence: 99%
“…Pre-training based transfer learning. As the basic technology of transfer learning, pre-training has been widely applied to different vision tasks [3,15]. Pre-training based transfer learning can be divided into two processes, respectively as pre-training and fin-tuning.…”
Section: Knowledge Preliminarymentioning
confidence: 99%