2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022
DOI: 10.1109/cvpr52688.2022.01166
|View full text |Cite
|
Sign up to set email alerts
|

Scaling Up Your Kernels to 31×31: Revisiting Large Kernel Design in CNNs

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

3
229
2

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 636 publications
(234 citation statements)
references
References 50 publications
3
229
2
Order By: Relevance
“…In computer vision, a line of works [Ding et al, 2019, Guo et al, 2020, Ding et al, 2021, Cao et al, 2022 explored using structural re-parameterization to create 2D convolution kernels. However, most of these works are limited to the vision domain and utilize only short-range convolution kernels (e.g., 7 × 7) with only one exception [Ding et al, 2022], which scales the size of convolution to 31 × 31 with an optimized CUDA kernel. Our SGConv kernel is a special parameterization of global convolution kernels that tackles LRD and showcases the extensibility of re-parameterized kernels.…”
Section: Related Workmentioning
confidence: 99%
“…In computer vision, a line of works [Ding et al, 2019, Guo et al, 2020, Ding et al, 2021, Cao et al, 2022 explored using structural re-parameterization to create 2D convolution kernels. However, most of these works are limited to the vision domain and utilize only short-range convolution kernels (e.g., 7 × 7) with only one exception [Ding et al, 2022], which scales the size of convolution to 31 × 31 with an optimized CUDA kernel. Our SGConv kernel is a special parameterization of global convolution kernels that tackles LRD and showcases the extensibility of re-parameterized kernels.…”
Section: Related Workmentioning
confidence: 99%
“…where P arallel 3×3,5×5,7×7 contains multi-branch of 3 × 3, 5 × 5, 7 × 7 convolution layers. Following (Ding et al, 2022;Guo et al, 2022), we apply dilated depthwise convolution with kernel size 5 × 5, 7 × 7, dilate rate as 2, 3 to obtain larger receptive filed.…”
Section: Mcamentioning
confidence: 99%
“…Con-vNext built a pure CNN family based on ResNet (He et al, 2016a), which performs on par or slightly better than ViT by learning their training procedure and macro/micro-level architecture designs. RepLKNet (Ding et al, 2022) follows the large kernel design in ViT and proposes to learn long-range relations by adopting as large as 31 × 31 kernel size to enlarge effective receptive fields. Although encouraging performance has been achieved by the above methods, their computation costs are relatively large.…”
Section: Introductionmentioning
confidence: 99%
“…Thus, information can be gathered from a large region. Inspired by this characteristic of Transformer, a series of works have been proposed to design better CNNs [48,40,12,16]. ConvMixer [48] utilizes large kernel convolutions to build the model and achieve the competitive performance to the ViT [15].…”
Section: Related Workmentioning
confidence: 99%