Proceedings of the 13th Annual Workshop on General Purpose Processing Using Graphics Processing Unit 2020
DOI: 10.1145/3366428.3380771
|View full text |Cite
|
Sign up to set email alerts
|

Automatic generation of specialized direct convolutions for mobile GPUs

Abstract: Convolutional Neural Networks (CNNs) are a powerful and versatile tool for performing computer vision tasks in both resource constrained settings and server-side applications. Most GPU hardware vendors provide highly tuned libraries for CNNs such as Nvidia's cuDNN or ARM Compute Library. Such libraries are the basis for higher-level, commonly-used, machine-learning frameworks such as PyTorch or Caffe, abstracting them away from vendor-specific implementation details. However, writing optimized parallel code fo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
2
2

Relationship

2
7

Authors

Journals

citations
Cited by 12 publications
(8 citation statements)
references
References 12 publications
0
8
0
Order By: Relevance
“…Tuning constraint inference Tiling tuning Parallelization tuning Padding tuning GPGPU'20 [14] Code Generation CGO'17 [21] Figure 1. The entire optimization flow in Lift.…”
Section: Tuningmentioning
confidence: 99%
See 1 more Smart Citation
“…Tuning constraint inference Tiling tuning Parallelization tuning Padding tuning GPGPU'20 [14] Code Generation CGO'17 [21] Figure 1. The entire optimization flow in Lift.…”
Section: Tuningmentioning
confidence: 99%
“…The focus is on the convolution -the most compute-intensive operation [10] of a CNN architecture. Prior work [14] has shown how this kernel can be expressed and optimized in Lift. In contrast to prior work, the mapping of parallelism is performed automatically using constraints.…”
Section: Tuningmentioning
confidence: 99%
“…In the future, we can combine DNNFusion's high-level abstraction to existing domain-specific polyhedral analysis. Similarly, another promising direction will be to integrate DNNFusion into other compilation-based DNN frameworks [25,45] or other popular general tensor/matrix/linear algebra computation frameworks, such as MLIR [40], Tiramisu [4], TACO [33,34], Halide [56], and LGen [38,64]. There also exist several other frameworks to optimize machine learning with operator fusion or fusion-based ideas.…”
Section: Related Workmentioning
confidence: 99%
“…Compiler optimization. There has been much interest in autotuning DNN code generators [10,23,37,48,64,72]. Polyhedral compilers are particularly well-suited [72,83] as they have in-built abstractions for exploiting parallelism and memory layout in a principled form.…”
Section: Interpolating Between Modelsmentioning
confidence: 99%