Jeff Setter scite author profile

Jeff Setter

5Publications

92Citation Statements Received

30Citation Statements Given

How they've been cited

317

How they cite others

173

Affiliations

Stanford University

Publications

Order By: Most citations

Programming Heterogeneous Systems from an Image Processing DSL

Bell

Yang

et al. 2017

ACM Trans. Archit. Code Optim.

View full text Add to dashboard Cite

Specialized image processing accelerators are necessary to deliver the performance and energy efficiency required by important applications in computer vision, computational photography, and augmented reality. But creating, "programming,"and integrating this hardware into a hardware/software system is difficult. We address this problem by extending the image processing language Halide so users can specify which portions of their applications should become hardware accelerators, and then we provide a compiler that uses this code to automatically create the accelerator along with the "glue" code needed for the user's application to access this hardware. Starting with Halide not only provides a very high-level functional description of the hardware, but also allows our compiler to generate the complete software program including the sequential part of the workload, which accesses the hardware for acceleration. Our system also provides high-level semantics to explore different mappings of applications to a heterogeneous system, with the added flexibility of being able to map at various throughput rates.We demonstrate our approach by mapping applications to a Xilinx Zynq system. Using its FPGA with two low-power ARM cores, our design achieves up to 6× higher performance and 38× lower energy compared to the quad-core ARM CPU on an NVIDIA Tegra K1, and 3.5× higher performance with 12× lower energy compared to the K1's 192-core GPU.

show abstract

SWAP: Effective Fine-Grain Management of Shared Last-Level Caches with Minimum Hardware Support

Wang

Chen

Setter

et al. 2017

View full text Add to dashboard Cite

Interstellar

et al. 2020

View full text Add to dashboard Cite

Creating an Agile Hardware Design Flow

Bahr

Barrett

Bhagdikar

et al. 2020

View full text Add to dashboard Cite

Unified Buffer: Compiling Image Processing and Machine Learning Applications to Push-Memory Accelerators

Liu

Setter

Huff

et al. 2023

ACM Trans. Archit. Code Optim.

View full text Add to dashboard Cite

Image processing and machine learning applications benefit tremendously from hardware acceleration. Existing compilers target either FPGAs, which sacrifice power and performance for programmability, or ASICs, which become obsolete as applications change. Programmable domain-specific accelerators, such as coarse-grained reconfigurable arrays (CGRAs), have emerged as a promising middle-ground, but they have traditionally been difficult compiler targets since they use a different memory abstraction. In contrast to CPUs and GPUs, the memory hierarchies of domain-specific accelerators use push memories : memories that send input data streams to computation kernels or to higher or lower levels in the memory hierarchy, and store the resulting output data streams. To address the compilation challenge caused by push memories, we propose that the representation of these memories in the compiler be altered to directly represent them by combining storage with address generation and control logic in a single structure—a unified buffer. The unified buffer abstraction enables the compiler to separate generic push memory optimizations from the mapping to specific memory implementations in the backend. This separation allows our compiler to map high-level Halide applications to different CGRA memory designs, including some with a ready-valid interface. The separation also opens the opportunity for optimizing push memory elements on reconfigurable arrays. Our optimized memory implementation, the Physical Unified Buffer (PUB), uses a wide-fetch, single-port SRAM macro with built-in address generation logic to implement a buffer with two read and two write ports. It is 18% smaller and consumes 31% less energy than a physical buffer implementation using a dual-port memory that only supports two ports. Finally, our system evaluation shows that enabling a compiler to support CGRAs leads to performance and energy benefits. Over a wide range of image processing and machine learning applications, our CGRA achieves 4.7 × better runtime and 3.5 × better energy-efficiency compared to an FPGA.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Jeff Setter

Programming Heterogeneous Systems from an Image Processing DSL

SWAP: Effective Fine-Grain Management of Shared Last-Level Caches with Minimum Hardware Support

Interstellar

Creating an Agile Hardware Design Flow

Unified Buffer: Compiling Image Processing and Machine Learning Applications to Push-Memory Accelerators

Contact Info

Product

Resources

About