2021
DOI: 10.48550/arxiv.2106.07477
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

S$^2$-MLP: Spatial-Shift MLP Architecture for Vision

Abstract: Recently, visual Transformer (ViT) and its following works abandon the convolution and exploit the self-attention operation, attaining a comparable or even higher accuracy than CNNs. More recently, MLP-Mixer abandons both the convolution and the self-attention operation, proposing an architecture containing only MLP layers. To achieve cross-patch communications, it devises an additional token-mixing MLP besides the channel-mixing MLP. It achieves promising results when training on an extremely large-scale data… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
28
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
8
1

Relationship

1
8

Authors

Journals

citations
Cited by 19 publications
(28 citation statements)
references
References 33 publications
0
28
0
Order By: Relevance
“…Following that, gMLP [22] proposed a Spatial Gating Unit to process spatial features. S 2 -MLP [39] adopts shifted spatial feature maps to augment information communication.…”
Section: Related Workmentioning
confidence: 99%
“…Following that, gMLP [22] proposed a Spatial Gating Unit to process spatial features. S 2 -MLP [39] adopts shifted spatial feature maps to augment information communication.…”
Section: Related Workmentioning
confidence: 99%
“…Recently, various variants are developed to achieve a better trade-off between accuracy and computational cost. For example, shift operation is introduced in S 2 -MLP [37] and AS-MLP [18] to exchange information across different tokens. Hire-MLP [8] present a hierarchical rearrangement operation, where the inner-region rearrangement and cross-region rearrangement capture local information and global context, respectively.…”
Section: Related Workmentioning
confidence: 99%
“…Research first propose a spatial-shift MLP architecture for vision, called S2MLP [93]. The actual practice is quite simple.…”
Section: Yu Et Al From Baidumentioning
confidence: 99%
“…As shown in the leftmost subdiagram in Figure 13. Models of this design also include FeedForward [37], ResMLP [40], gMLP [88], S2MLP [93], CCS [106], RaftMLP [92], Sparse-MLP(MoE) [103]. Due to the limited computing resources, the patch partition during patch embedding of the single-stage model is usually large, e.g.…”
Section: From Single-stage To Pyramidmentioning
confidence: 99%