CycleMLP: A MLP-like Architecture for Dense Prediction

Chen, Shoufa; Xie, Enze; Ge, Chongjian; Chen, Runjian; Ding, Liang; Luo, Ping

doi:10.48550/arxiv.2107.10224

Cited by 42 publications

(80 citation statements)

References 71 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For example, MLP-Mixer (Tolstikhin et al, 2021) replaces them both with MLPs applied across different dimensions (i.e., spatial and channel location mixing); ResMLP (Touvron et al, 2021a) is a data-efficient variation on this theme. CycleMLP (Chen et al, 2021), gMLP , and vision permutator (Hou et al, 2021), replace one or both blocks with various novel operations. These are all quite performant, which is typically attributed to the novel choice of operations.…”

Section: R Wmentioning

confidence: 99%

Patches Are All You Need?

Trockman¹,

Kolter²

2022

Preprint

View full text Add to dashboard Cite

Section: R Wmentioning

confidence: 99%

Patches Are All You Need?

Trockman¹,

Kolter²

2022

Preprint

View full text Add to dashboard Cite

“…In order to compare with PVT [34], CycleMLP [3] and Hire-MLP [8], we conduct experiments based on Reti-naNet [19] and Mask R-CNN [12]. We use AdamW optimizer with a batch size of 2 images per GPU, the initial learning rate is set to 1e-4 and divided by 10 at the 8th and the 11th epoch.…”

Section: Object Detectionmentioning

confidence: 99%

PyramidTNT: Improved Transformer-in-Transformer Baselines with Pyramid Architecture

Han¹,

Guo²,

Wang³

2022

Preprint

View full text Add to dashboard Cite

Transformer networks have achieved great progress for computer vision tasks. Transformer-in-Transformer (TNT) architecture utilizes inner transformer and outer transformer to extract both local and global representations. In this work, we present new TNT baselines by introducing two advanced designs: 1) pyramid architecture, and 2) convolutional stem. The new "PyramidTNT" significantly improves the original TNT by establishing hierarchical representations. PyramidTNT achieves better performances than the previous state-of-the-art vision transformers such as Swin Transformer. We hope this new baseline will be helpful to the further research and application of vision transformer. Code will be available at https: //github.com/huawei-noah/CV-Backbones/ tree/master/tnt_pytorch.

show abstract

“…Chen et al publish CycleMLP [98] on arXiv three days after AS-MLP is proposed. Although CycleMLP does not directly shift feature maps, it integrates features at different spatial locations along the channel direction by deformable convolution [99], which is an equivalent approach to shifting the feature map.…”

Section: Yu Et Al From Baidumentioning

confidence: 99%

“…Specially, the whole architecture contains four stages, where the feature resolution reduces from H/4 × W/4 to H/32 × W/32 and the output dimension increases accordingly. The network based on this design includes Sparse MLP [91], HireMLP [100], AS-MLP [95] and CycleMLP [98]. Patch embedding can be equivalently achieved by a convolution layer with kernel size equal to stride equal to patch size.…”

Section: From Single-stage To Pyramidmentioning

confidence: 99%

Are we ready for a new paradigm shift? A Survey on Visual Deep MLP

Liu¹,

Li²,

Tao³

et al. 2021

Preprint

View full text Add to dashboard Cite

Multilayer perceptron (MLP), as the first neural network structure to appear, was a big hit. But constrained by the hardware computing power and the size of the datasets, it once sank for tens of years. During this period, we have witnessed a paradigm shift from manual feature extraction to the CNN with local receptive field, and further to the Transformer with global receptive field based on selfattention mechanism. And this year (2021), with the introduction of MLP-Mixer, MLP has re-entered the limelight and has attracted extensive research from the computer vision community. Compare to the conventional MLP, it gets deeper but changes the input from full flattening to patch flattening. Given its high performance and less need for vision-specific inductive bias, the community can't help but wonder, Will deep MLP, the simplest structure with global receptive field but no attention, become a new computer vision paradigm? To answer this question, this survey aims to provide a comprehensive overview of the recent development of deep MLP models in vision. Specifically, we review these MLPs in detail, from the subtle sub-module design to the global network structure. We compare the receptive field, computational complexity, and other properties of different network designs in order to understand the development path of MLPs clearly. The investigation shows that MLPs' resolution-sensitivity and computational densities remain unresolved, and pure MLPs are gradually evolving towards CNN-like. We suggest that the current data volume and computational power are not ready to embrace pure MLPs, and artificial visual guidance remains important. Finally, we provide our viewpoint about open research directions and potential future works. We hope this effort will ignite further interest in the community and encourage better visual tailored design for the neural network in the future.

show abstract

CycleMLP: A MLP-like Architecture for Dense Prediction

Cited by 42 publications

References 71 publications

Patches Are All You Need?

Patches Are All You Need?

PyramidTNT: Improved Transformer-in-Transformer Baselines with Pyramid Architecture

Are we ready for a new paradigm shift? A Survey on Visual Deep MLP

Contact Info

Product

Resources

About