Abstract-As the complexity of FPGA-based systems scales, the importance of efficiently handling irregular code increases. Recent work has proposed Irregular Code Energy Reducers (ICERs), a high-level synthesis approach for FPGAs that offers significant energy reduction for irregular code compared to a soft core processor. ICERs target the hot-spots of programs, and are seamlessly connected via a shared L1 cache with a soft processor that executes the cold code. This paper evaluates the application of the selective depipelining (SDP) technique to ICERs, which greatly reduces both the execution time and energy of irregular computations.SDP enables irregular computations to be expressed as large, fast, low-power combinational blocks. SDP maintains high memory bandwidth by scheduling the many potentially dependent memory operations within these blocks onto a high-frequency, highly-multiplexed coherent memory while scheduling combinational operations at a much lower frequency. SDP is a key enabler for improving the execution properties of irregular computations that are difficult to parallelize. We show that applying SDP to ICERs reduces energy-delay by 2.62× relative to ICERs. ICERs with SDP are up to 2.38× faster than a soft core processor and reduce energy consumption by up to 15.83× for a variety of irregular applications.
I. INTRODUCTIONFPGAs now play host to increasingly large-scale systems with complex and varied behaviors. To manage complexity, many designers are turning toward high-level synthesis (HLS) tools, which allow them to specify system behavior in a highlevel language. A common approach incorporates applicationspecific hardware to execute the highly-parallel regions of code, coupled with a soft processor core to handle the remaining code. Although existing HLS tools can target highly structured, parallel code for acceleration, most are ill-suited for the remaining irregular, difficult-to-parallelize code. However, as systems continue to scale, the execution of the non-parallel regions on soft cores limits the efficiency of the system as a whole. Recent work [26], [20], [1] attained energy savings by converting even irregular code regions into specialized circuits.One such approach [1] improves the energy efficiency of soft-core-based systems by converting hot regions of large, irregular C programs into a collection of energy-saving application-specialized coprocessors called Irregular Code Energy Reducers, or ICERs. Execution jumps between hot code, which runs on the ICERs, and cold code, which runs on the soft core. In both cases, memory operations are performed through a shared L1 cache, which eliminates the need for pointer analysis and enables high code coverage.Although ICERs were able to achieve up to 9.5× savings in energy for targeted code at approximately the same level of performance, the energy and performance were limited by