2021
DOI: 10.48550/arxiv.2107.10224
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

CycleMLP: A MLP-like Architecture for Dense Prediction

Abstract: This paper presents a simple MLP-like architecture, Cy-cleMLP, which is a versatile backbone for visual recognition and dense predictions, unlike modern MLP architectures, e.g., MLP-Mixer [49], ResMLP [50], and gMLP [35], whose architectures are correlated to image size and thus are infeasible in object detection and segmentation. Cy-cleMLP has two advantages compared to modern approaches. (1) It can cope with various image sizes. ( 2) It achieves linear computational complexity to image size by using local wi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
80
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 42 publications
(80 citation statements)
references
References 71 publications
0
80
0
Order By: Relevance
“…For example, MLP-Mixer (Tolstikhin et al, 2021) replaces them both with MLPs applied across different dimensions (i.e., spatial and channel location mixing); ResMLP (Touvron et al, 2021a) is a data-efficient variation on this theme. CycleMLP (Chen et al, 2021), gMLP , and vision permutator (Hou et al, 2021), replace one or both blocks with various novel operations. These are all quite performant, which is typically attributed to the novel choice of operations.…”
Section: R Wmentioning
confidence: 99%
“…For example, MLP-Mixer (Tolstikhin et al, 2021) replaces them both with MLPs applied across different dimensions (i.e., spatial and channel location mixing); ResMLP (Touvron et al, 2021a) is a data-efficient variation on this theme. CycleMLP (Chen et al, 2021), gMLP , and vision permutator (Hou et al, 2021), replace one or both blocks with various novel operations. These are all quite performant, which is typically attributed to the novel choice of operations.…”
Section: R Wmentioning
confidence: 99%
“…In order to compare with PVT [34], CycleMLP [3] and Hire-MLP [8], we conduct experiments based on Reti-naNet [19] and Mask R-CNN [12]. We use AdamW optimizer with a batch size of 2 images per GPU, the initial learning rate is set to 1e-4 and divided by 10 at the 8th and the 11th epoch.…”
Section: Object Detectionmentioning
confidence: 99%
“…Chen et al publish CycleMLP [98] on arXiv three days after AS-MLP is proposed. Although CycleMLP does not directly shift feature maps, it integrates features at different spatial locations along the channel direction by deformable convolution [99], which is an equivalent approach to shifting the feature map.…”
Section: Yu Et Al From Baidumentioning
confidence: 99%
“…Specially, the whole architecture contains four stages, where the feature resolution reduces from H/4 × W/4 to H/32 × W/32 and the output dimension increases accordingly. The network based on this design includes Sparse MLP [91], HireMLP [100], AS-MLP [95] and CycleMLP [98]. Patch embedding can be equivalently achieved by a convolution layer with kernel size equal to stride equal to patch size.…”
Section: From Single-stage To Pyramidmentioning
confidence: 99%