Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation 2021
DOI: 10.1145/3453483.3454083
|View full text |Cite
|
Sign up to set email alerts
|

DNNFusion: accelerating deep neural networks execution with advanced operator fusion

Abstract: Deep Neural Networks (DNNs) have emerged as the core enabler of many major applications on mobile devices. To achieve high accuracy, DNN models have become increasingly deep with hundreds or even thousands of operator layers, leading to high memory and computational requirements for inference. Operator fusion (or kernel/layer fusion) is key optimization in many state-of-the-art DNN execution frameworks, such as TensorFlow, TVM, and MNN, that aim to improve the efficiency of the DNN inference. However, these fr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
17
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
3
2

Relationship

0
10

Authors

Journals

citations
Cited by 75 publications
(20 citation statements)
references
References 74 publications
0
17
0
Order By: Relevance
“…Existing work [23,24] has identified such problem in a more general scope and introduces a compiler-level optimization technique to exploit inter-operation and intra-operation parallelism. These solutions could further improve the parallel FFN computation efficiency in MoE systems.…”
Section: Discussionmentioning
confidence: 99%
“…Existing work [23,24] has identified such problem in a more general scope and introduces a compiler-level optimization technique to exploit inter-operation and intra-operation parallelism. These solutions could further improve the parallel FFN computation efficiency in MoE systems.…”
Section: Discussionmentioning
confidence: 99%
“…2 Graph-level: DNNs with many operators are commonly represented as directed acyclic graphs (DAGs), which use nodes to represent operators and edges to represent the data flow and dependency [17]. Single-model DAGs are usually sequential with limited parallelism like VGG, ResNets, MobileNets and EfficietNets, which have only one or two branches and thus exposes small scheduling space [23].…”
Section: A Challenges For Multi-tenant DL Computingmentioning
confidence: 99%
“…AStitch is orthogonal with the above studies in that it focuses on generating high performance GPU kernels given a large group of memory-intensive operators just-in-time. Niu et al [34] make studies about fusion optimization for the inference on mobile devices, while AStitch targets both training and inference on industrial GPU vendors, showing different targets and techniques. Zheng et al [57] explore operator stitching with shared memory, and use a two-level cost-model based method for fusion pattern decision and codegen schedule selection.…”
Section: Related Workmentioning
confidence: 99%