NeoFlow: A Flexible Framework for Enabling Efficient Compilation for High Performance DNN Training

Zheng, Size; Chen, Renze; Jin, Yaqi; Wei, Anjiang; Wu, Bingyang; Li, Xiuhong; Yan, Shengen; Liang, Yun

doi:10.1109/tpds.2021.3138862

Cited by 12 publications

(5 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Recently, deep learning compilers (e.g., TVM , MLIR (Lattner et al, 2020b), and Glow ) have demonstrated the ability to reduce dramatically inference latencies , training times (Zheng et al, 2022), and memory usage . These compilers function by extracting intermediate-level representations (IRs) of the DNNs from the representations produced by the frameworks, and performing various optimizations (e.g., kernel fusion (Ashari et al, 2015), vectorization (Maleki et al, 2011), and memory planning ) on those IRs.…”

Section: Bragghls: High-level Synthesis For Low-latency Deep Neural N...mentioning

confidence: 99%

An End-to-End Programming Model for AI Engine Architectures

Levental,

Khan,

Chard

et al. 2024

14th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies (HEART'24))

View full text Add to dashboard Cite

Section: Bragghls: High-level Synthesis For Low-latency Deep Neural N...mentioning

confidence: 99%

An End-to-End Programming Model for AI Engine Architectures

Levental,

Khan,

Chard

et al. 2024

14th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies (HEART'24))

View full text Add to dashboard Cite

“…Therefore, the convolution products for every point are added together, where the result of the computation output is nonetheless but activation output, named output feature map. Moreover, all input feature maps are processed together as a batch (Yang, 2023), resulting in improvement of filter weights. Additionally, there are other optional layers, as observed in the figure above like, nonlinearity (generally it can evaluate the maximum value of two intersecting function), pooling (it makes the network to withstand to any invariance or distortion) and normalization which is nonetheless but, controlling the input distribution through the layers.…”

Section: Neural Networkmentioning

confidence: 99%

Venturing into the Age of AI: Insights and Perspectives

2023

FORUM A+P

View full text Add to dashboard Cite

ObjectivesThe aims of the Workshop in Multimedia Design were to combine the content of the multimedia with applications connected to product branding. The theoretical aspects were presented, while at the same time, a series of exercises were performed and presented within the class. The participants worked in teams and at the end of every exercise, they were all actively present the projects' outcomes. The rest of the participants and the curators asked questions in order to clarify issues that came up during the presentations. Additionally, a series of digital tools was presented based on real case scenarios and thus the participants were exposed to knowledges and experiences, when using a multimedia toolkit, with an increased number of tools, from different points of view i.e. text, audio, sketching, vector design, image processing, animation, video, artificial intelligence based media creations.

show abstract

“…Halide [49] introduces the concept of compute and schedule to represent software and optimization, while TVM [9] generalizes the concept and allows users to use tensorize primitive to lower part of software to spatial accelerators manually. Automatic schedulers such as Halide Scheduler [2,30,38], FlexTensor [70], Pro-Tuner [19], ALT [63], Rammer [34], NeoFlow [69], and Ansor [68] focus on general-purpose hardware and ignore the mapping problem for spatial accelerators such as Tensor Core. Polyhedral model is widely used in compilers for constrained optimization [5,56,57].…”

Section: Related Workmentioning

confidence: 99%

Amos

Zheng

Chen

Wei

et al. 2022

Proceedings of the 49th Annual International Symposium on Computer Architecture

Self Cite

View full text Add to dashboard Cite

Hardware specialization is a promising trend to sustain performance growth. Spatial hardware accelerators that employ specialized and hierarchical computation and memory resources have recently shown high performance gains for tensor applications such as deep learning, scientific computing, and data mining. To harness the power of these hardware accelerators, programmers have to use specialized instructions with certain hardware constraints. However, these hardware accelerators and instructions are quite new and there is a lack of understanding of the hardware abstraction, performance optimization space, and automatic methodologies to explore the space. Existing compilers use handtuned computation implementations and optimization templates, resulting in sub-optimal performance and heavy development costs.In this paper, we propose AMOS, which is an automatic compilation framework for spatial hardware accelerators. Central to this framework is the hardware abstraction that not only clearly specifies the behavior of spatial hardware instructions, but also formally defines the mapping problem from software to hardware. Based on * Work done while the author was a student at Peking University.

show abstract

NeoFlow: A Flexible Framework for Enabling Efficient Compilation for High Performance DNN Training

Cited by 12 publications

References 18 publications

An End-to-End Programming Model for AI Engine Architectures

An End-to-End Programming Model for AI Engine Architectures

Venturing into the Age of AI: Insights and Perspectives

Amos

Contact Info

Product

Resources

About