2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022
DOI: 10.1109/cvpr52688.2022.01055
|View full text |Cite
|
Sign up to set email alerts
|

MetaFormer is Actually What You Need for Vision

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

2
247
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 596 publications
(249 citation statements)
references
References 25 publications
2
247
0
Order By: Relevance
“…When equipped with RetinaNet 1x (Lin et al, 2017b) setting, ConvFormer-S achieves gains Table 5 Results of semantic segmentation on ADE20K (Zhou et al, 2019) validation set. The pre-trained ConvFormer-S is plugged in Semantic FPN (Kirillov et al, 2019a) and UperNet (Xiao et al, 2018) frameworks, and training/validation schemes are following (Yu et al, 2021) and (Liu et al, 2021c). Note that the difference in the number of parameters between two ConvFormer-S is due to the usage of semantic FPN and UperNet.…”
Section: Resultsmentioning
confidence: 99%
See 3 more Smart Citations
“…When equipped with RetinaNet 1x (Lin et al, 2017b) setting, ConvFormer-S achieves gains Table 5 Results of semantic segmentation on ADE20K (Zhou et al, 2019) validation set. The pre-trained ConvFormer-S is plugged in Semantic FPN (Kirillov et al, 2019a) and UperNet (Xiao et al, 2018) frameworks, and training/validation schemes are following (Yu et al, 2021) and (Liu et al, 2021c). Note that the difference in the number of parameters between two ConvFormer-S is due to the usage of semantic FPN and UperNet.…”
Section: Resultsmentioning
confidence: 99%
“…We follow the main training strategies in Pool-Former (Yu et al, 2021). Specifically, Cutmix (Yun et al, 2019), RandAugment (Cubuk et al, 2020), Mixup (Zhang et al, 2018), and Label Smoothing regularization (Szegedy et al, 2016) are applied to augment the training data.…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…Although Jet Image uses CNNs, we use a relatively new model, gated MLP (gMLP) [5] in this study, because the performance of newer models such as Vision Transformer (ViT) [6] and MLPMixer [7] improved significantly over the past few years. These new models are based on a similar structure called the Metaformer [8].…”
Section: Related Workmentioning
confidence: 99%