2022
DOI: 10.1109/tpds.2021.3084813
|View full text |Cite
|
Sign up to set email alerts
|

Optimizing Depthwise Separable Convolution Operations on GPUs

Abstract: The depthwise separable convolution is commonly seen in convolutional neural networks (CNNs), and is widely used to reduce the computation overhead of a standard multi-channel 2D convolution. Existing implementations of depthwise separable convolutions target accelerating model training with large batch sizes with a large number of samples to be processed at once. Such approaches are inadequate for small-batch-sized model training and the typical scenario of model inference where the model takes in a few sampl… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
10
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
3
3

Relationship

0
9

Authors

Journals

citations
Cited by 48 publications
(10 citation statements)
references
References 40 publications
0
10
0
Order By: Relevance
“…As explained in section 1, depthwise-separable convolutions (DWSConv) [7] are a common design choice for reducing the computational cost of DL models. However, because DWSConv involves far fewer floating point operations than standard 2D convolutions, its execution time on a GPU is dominated by the memory access latency [15]. To overcome this bottleneck, existing implementations of DWSConv try to accelerate execution by using large batch sizes.…”
Section: Analysis and Discussionmentioning
confidence: 99%
“…As explained in section 1, depthwise-separable convolutions (DWSConv) [7] are a common design choice for reducing the computational cost of DL models. However, because DWSConv involves far fewer floating point operations than standard 2D convolutions, its execution time on a GPU is dominated by the memory access latency [15]. To overcome this bottleneck, existing implementations of DWSConv try to accelerate execution by using large batch sizes.…”
Section: Analysis and Discussionmentioning
confidence: 99%
“…However, the runtime is longer than that of the original PSMNet. As mentioned in [47,48], the reason may be that the cuDNN library does not fully support depthwise convolutions and pointwise convolutions. For the GPU platform of the cuDNN library, the optimization of classic convolutions on end-to-end training is better.…”
Section: Discussionmentioning
confidence: 99%
“…The feature mixing operation in equation 2 with the following pointwise activation function (i.e., ReLU) may offer a sufficient rank of the feature with the efficient operation. However, the inverted bottleneck and the variants below usually need a large expansion ratio ρ > 1 to secure the expressiveness (Sandler et al, 2018;Howard et al, 2019;Tan & Le, 2019), so the actual speed is hampered by the grouped operation that requires more optimization on GPU (Gibson et al, 2020;Lu et al, 2021).…”
Section: Efficient Building Blocksmentioning
confidence: 99%