LISA: Graph Neural Network based Portable Mapping on Spatial Accelerators

Li, Zhaoying; Wu, Dan; Wijerathne, Dhananjaya; Mitra, Tulika

doi:10.1109/hpca53966.2022.00040

Cited by 23 publications

(8 citation statements)

References 46 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…By so, it is possible to compute the number of equivalent MAC per second of the platform, which can reach 6.4 x10 11 MAC/s while working at 10 GHz. Further modeling of the system has shown how the platform can be scaled to 512 channels, 91 reaching more than 20 TOPS/W. The system presented here has shown important features and results.…”

Section: Neural Network and Next Stepsmentioning

confidence: 91%

Design and testing of silicon photonic 4F system for convolutional neural networks

Peserico

Meng

Yang³

et al. 2023

Integrated Optics: Devices, Materials, and Technologies XXVII

View full text Add to dashboard Cite

Convolution Neural Networks have raised as the key technology for most of the novel applications that appear in the last years. Convolution, the main operation that CNN has to perform, has a high computational cost, raising power consumption and latency, especially for large matrices. Optics and photonics can perform the same operation at virtual O(1) cost and speed-of-light latency, thanks to the properties of Fourier optics. In this paper, we will show the implementation of the main components and the modeling for non-idealities that might occur.

show abstract

Section: Neural Network and Next Stepsmentioning

confidence: 91%

Design and testing of silicon photonic 4F system for convolutional neural networks

Peserico

Meng

Yang³

et al. 2023

Integrated Optics: Devices, Materials, and Technologies XXVII

View full text Add to dashboard Cite

show abstract

“…Revisiting the capability of synthesis/optimization techniques to address mapping issues in emerging architectures could provide huge benefits in productive software toolchains. Recent examples like NVIDIA's DSL [5], CoSA which looks at pure static machines [28], and ML-based scheduling [37] address some aspects of this. Many unsolved problems remain open to architects, especially when considering static/dynamic hybrids like RED and GPUs.…”

Section: Spatial Schedulers Have Several Unsolved Research Problemsmentioning

confidence: 99%

The Mozart reuse exposed dataflow processor for AI and beyond

Sankaralingam

Nowatzki

Gangadhar

et al. 2022

Proceedings of the 49th Annual International Symposium on Computer Architecture

View full text Add to dashboard Cite

In this paper we introduce the Mozart Processor, which implements a new processing paradigm called Reuse Exposed Dataflow (RED). RED is a counterpart to existing execution models of Von-Neumann, SIMT, Dataflow, and FPGA. Dataflow and data reuse are the fundamental architecture primitives in RED, implemented with mechanisms for inter-worker communication and synchronization. The paper defines the processor architecture, the details of the microarchitecture, chip implementation, software stack development, and performance results. The architecture's goal is to achieve near-CPU like flexibility while having ASIC-like efficiency for a large-class of data-intensive workloads. An additional goal was software maturity -have large coverage of applications immediately, avoiding the need for a long-drawn hand-tuning software development phase. The architecture was defined with this software-maturity/compiler friendliness in mind. In short, the goal was to do to GPUs, what GPUs did to CPUs -i.e. be a better solution for a large range of workloads, while preserving flexibility and programmability. The chip was implemented with HBM and PCIe interfaces and taken to production on a 16nm TSMC FFC process. For ML inference tasks with batch-size=4, Mozart is integer factors better than state-of-theart GPUs even while being nearly 2 technology nodes behind. We conclude with a set of lessons learned, the unique challenges of a clean-slate architecture in a commercial setting, and pointers for uncovered research problems. CCS CONCEPTS• Computer systems organization → Data flow architectures;• Hardware → Hardware accelerators.

show abstract

“…However, they cannot derive rational heuristics from the mapping problem and require considerable manual efforts to tuning hyperparameters for specific architectures. The heuristic of machinelearning-directed mapping such as [29,31,61] can be automatically generated only after tedious modeling and training. RAMP and TAEM are mappers using heuristic algorithms designed for specific RSAs.…”

Section: Reconfigurable Spatial Architecture Compilation Overviewmentioning

confidence: 99%

CaSMap

Man

Zhu

Song

et al. 2022

Proceedings of the 49th Annual International Symposium on Computer Architecture

View full text Add to dashboard Cite

Today, reconfigurable spatial architectures (RSAs) have sprung up as accelerators for compute-and data-intensive domains because they deliver energy and area efficiency close to ASICs and still retain sufficient programmability to keep the development cost low. The mapper, which is responsible for mapping algorithms onto RSAs, favors a systematic backtracking methodology because of high portability for evolving RSA designs. However, exponentially scaling compilation time has become the major obstacle. The key observation of this paper is that the key limiting factor to the systematic backtracking mappers is the waterfall mapping model which resolves all mapping variables and constraints at the same time using single-level intermediate representations (IRs).This work proposes CaSMap, an agile mapper framework independent of software and hardware of RSAs. By clustering the lowest-level software and hardware IRs into multi-level IRs, the original mapping process can be scattered as multi-stage decomposed ones and therefore the mapping problem with exponential complexity is mitigated. This paper introduces (a) strategies for clustering low-level hardware and software IRs with static connectivity and critical path analysis. (b) a multi-level scattered mapping model in which the higher-level model carries out the heuristics from IR clustering, endeavors to promote mapping success rate, and reduces the scale of the lower-level model. Our evaluation shows that CaSMap is able to reduce the problem scale (nonzeros) by 80.5% (23.1%-94.9%) and achieve a mapping time speedup of 83× over the state-of-the-art waterfall mapper across four different RSA topologies: MorphoSys, HReA, HyCUBE, and REVEL. CCS CONCEPTS• Computer systems organization → Reconfigurable computing; • Software and its engineering → Retargetable compilers.

show abstract

LISA: Graph Neural Network based Portable Mapping on Spatial Accelerators

Cited by 23 publications

References 46 publications

Design and testing of silicon photonic 4F system for convolutional neural networks

Design and testing of silicon photonic 4F system for convolutional neural networks

The Mozart reuse exposed dataflow processor for AI and beyond

CaSMap

Contact Info

Product

Resources

About