2017
DOI: 10.1145/3107953
|View full text |Cite
|
Sign up to set email alerts
|

Programming Heterogeneous Systems from an Image Processing DSL

Abstract: Specialized image processing accelerators are necessary to deliver the performance and energy efficiency required by important applications in computer vision, computational photography, and augmented reality. But creating, "programming,"and integrating this hardware into a hardware/software system is difficult. We address this problem by extending the image processing language Halide so users can specify which portions of their applications should become hardware accelerators, and then we provide a compiler t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
56
0
1

Year Published

2018
2018
2024
2024

Publication Types

Select...
5
3
2

Relationship

0
10

Authors

Journals

citations
Cited by 98 publications
(57 citation statements)
references
References 47 publications
0
56
0
1
Order By: Relevance
“…HIPA CC [25], for example, uses a sourceto-source compiler from a C-like front-end to generate CUDA, OpenCL, and Renderscript for targeting GPUs. Recent work on Halide [35] has demonstrated targeting heterogeneous systems, including the Xilinx Zynq's FPGA and ARM cores, by generating intermediate C++ and Vivado HLS [33]. Rigel [20] and Darkroom [19] generate Verilog, and PolyMage [14] generates OpenMP and C++ for high-level synthesis.…”
Section: Related Workmentioning
confidence: 99%
“…HIPA CC [25], for example, uses a sourceto-source compiler from a C-like front-end to generate CUDA, OpenCL, and Renderscript for targeting GPUs. Recent work on Halide [35] has demonstrated targeting heterogeneous systems, including the Xilinx Zynq's FPGA and ARM cores, by generating intermediate C++ and Vivado HLS [33]. Rigel [20] and Darkroom [19] generate Verilog, and PolyMage [14] generates OpenMP and C++ for high-level synthesis.…”
Section: Related Workmentioning
confidence: 99%
“…However, there is not much work in the deep learning literature in general, model inference in particular. Pu et al [32] extended the image processing language, Halide [33], to allow users to specify which part of their applications they want to execute on hardware accelerators (FPGA in their case). Similar to their technique, our approach also allows users to offload different portions of a CNN to different devices so that programmers can quickly build and tune new CNN models.…”
Section: Related Workmentioning
confidence: 99%
“…Because D-SWIM and Design1-2 have different throughputs, it was unfair to compare the resource number in Table 5 directly. Thus, we obtained the hardware efficiency of FPGA logic (LUT and REG) with Equation (5). Comparing with the highest-throughput design (Design1), the hardware efficiency of D-SWIM is 4.8× and 8.2× in LUT and REG, respectively.…”
Section: D-swim Buffermentioning
confidence: 99%