2021
DOI: 10.1109/tpds.2021.3138862
|View full text |Cite
|
Sign up to set email alerts
|

NeoFlow: A Flexible Framework for Enabling Efficient Compilation for High Performance DNN Training

Abstract: Deep neural networks (DNNs) are increasingly deployed in various image recognition and natural language processing applications. The continuous demand for accuracy and high performance has led to innovations in DNN design and a proliferation of new operators. However, existing DNN training frameworks such as PyTorch and TensorFlow only support a limited range of operators and rely on hand-optimized libraries to provide efficient implementations for these operators. To evaluate novel neural networks with new op… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
9
1

Relationship

1
9

Authors

Journals

citations
Cited by 12 publications
(5 citation statements)
references
References 18 publications
0
5
0
Order By: Relevance
“…Recently, deep learning compilers (e.g., TVM , MLIR (Lattner et al, 2020b), and Glow ) have demonstrated the ability to reduce dramatically inference latencies , training times (Zheng et al, 2022), and memory usage . These compilers function by extracting intermediate-level representations (IRs) of the DNNs from the representations produced by the frameworks, and performing various optimizations (e.g., kernel fusion (Ashari et al, 2015), vectorization (Maleki et al, 2011), and memory planning ) on those IRs.…”
Section: Bragghls: High-level Synthesis For Low-latency Deep Neural N...mentioning
confidence: 99%
“…Recently, deep learning compilers (e.g., TVM , MLIR (Lattner et al, 2020b), and Glow ) have demonstrated the ability to reduce dramatically inference latencies , training times (Zheng et al, 2022), and memory usage . These compilers function by extracting intermediate-level representations (IRs) of the DNNs from the representations produced by the frameworks, and performing various optimizations (e.g., kernel fusion (Ashari et al, 2015), vectorization (Maleki et al, 2011), and memory planning ) on those IRs.…”
Section: Bragghls: High-level Synthesis For Low-latency Deep Neural N...mentioning
confidence: 99%
“…Therefore, the convolution products for every point are added together, where the result of the computation output is nonetheless but activation output, named output feature map. Moreover, all input feature maps are processed together as a batch (Yang, 2023), resulting in improvement of filter weights. Additionally, there are other optional layers, as observed in the figure above like, nonlinearity (generally it can evaluate the maximum value of two intersecting function), pooling (it makes the network to withstand to any invariance or distortion) and normalization which is nonetheless but, controlling the input distribution through the layers.…”
Section: Neural Networkmentioning
confidence: 99%
“…Halide [49] introduces the concept of compute and schedule to represent software and optimization, while TVM [9] generalizes the concept and allows users to use tensorize primitive to lower part of software to spatial accelerators manually. Automatic schedulers such as Halide Scheduler [2,30,38], FlexTensor [70], Pro-Tuner [19], ALT [63], Rammer [34], NeoFlow [69], and Ansor [68] focus on general-purpose hardware and ignore the mapping problem for spatial accelerators such as Tensor Core. Polyhedral model is widely used in compilers for constrained optimization [5,56,57].…”
Section: Related Workmentioning
confidence: 99%