Modeling Loop Unrolling: Approaches and Open Issues

Cardoso, João M. P.; Diniz, Pedro C.

doi:10.1007/978-3-540-27776-7_24

Cited by 12 publications

(7 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Previous approaches in predicting the impact of loop unrolling include Liao et al [2003] and Cardoso and Diniz [2004]. In Liao et al [2003], the authors propose a model for the hardware realization of kernel loops.…”

Section: Background and Related Workmentioning

confidence: 99%

“…In Cardoso and Diniz [2004], the authors propose a model to predict the impact of full loop unrolling on the execution time and on the number of required resources, without explicitly performing it. However, unroll-and-jam (unrolling one or more nested loops in the iteration space and fusing inner loop bodies together) is not covered.…”

Section: Background and Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Optimal Loop Unrolling and Shifting for Reconfigurable Architectures

Dragomir

Stefanov

Bertels

2009

ACM Trans. Reconfigurable Technol. Syst.

View full text Add to dashboard Cite

In this article, we present a new technique for optimizing loops that contain kernels mapped on a reconfigurable fabric. We assume the Molen machine organization as our framework. We propose combining loop unrolling with loop shifting, which is used to relocate the function calls contained in the loop body such that in every iteration of the transformed loop, software functions (running on GPP) execute in parallel with multiple instances of the kernel (running on FPGA). The algorithm computes the optimal unroll factor and determines the most appropriate transformation (which can be the combination of unrolling plus shifting or either of the two). This method is based on profiling information about the kernel's execution times on GPP and FPGA, memory transfers and area utilization. In the experimental part, we apply this method to several kernels from loop nests extracted from real-life applications (DCT and SAD from MPEG2 encoder, Quantizer from JPEG, and Sobel's Convolution) and perform an analysis of the results, comparing them with the theoretical maximum speedup by Amdahl's Law and showing when and how our transformations are beneficial.

show abstract

Section: Background and Related Workmentioning

confidence: 99%

Section: Background and Related Workmentioning

confidence: 99%

Optimal Loop Unrolling and Shifting for Reconfigurable Architectures

Dragomir

Stefanov

Bertels

2009

ACM Trans. Reconfigurable Technol. Syst.

View full text Add to dashboard Cite

show abstract

“…Several approaches ( [4], [5], [6], [7], [8], [9]) are focused on accelerating kernel loops in hardware. They use different loop transformations (unrolling, pipelining, etc) to exploit parallelism and speedup the kernel.…”

Section: Background and Related Workmentioning

confidence: 99%

Loop unrolling and shifting for reconfigurable architectures

Dragomir

Stefanov

Bertels

2008

2008 International Conference on Field Programmable Logic and Applications

View full text Add to dashboard Cite

Loops are an important source of optimization. In this paper, we propose a new technique for optimizing loops that contain kernels mapped on a reconfigurable fabric. We assume the Molen machine organization and programming paradigm as our framework. The method we propose extends our previous work on loop unrolling for reconfigurable architectures by combining unrolling with shifting to relocate the function calls contained in the loop body such that in every iteration of the transformed loop, software functions (running on GPP) execute in parallel with multiple instances of the kernel (running on FPGA). The algorithm is based on profiling information about the kernel's execution times on GPP and FPGA, memory transfers and area utilization. In the experimental part, we apply this method to a loop nest extracted from MPEG2 encoder containing the DCT kernel. The achieved speedup is 19.65x over software execution and 1.8x over loop unrolling.

show abstract

“…Future plans include high-level estimations to acquire the impact of code transformations, e.g. to decide about loop unrolling [12].…”

Section: Compilation Of Software To Fpgasmentioning

confidence: 99%

On Estimations for Compiling Software to FPGA-based Systems

Cardoso

2005 IEEE International Conference on Application-Specific Systems, Architecture Processors (ASAP'05)

View full text Add to dashboard Cite

This paper presents recent advances in a compiler infrastructure to map algorithms described in a Java subset to FPGA-based platforms. We explain how delays and resources are estimated to guide the compiler through scheduling and temporal partitioning. The compiler supports complex analytical models to estimate resources and delays for each functional unit. The paper presents experimental results for a number of benchmarks. Those results also arrise a question when performing temporal partitioning: shall we try to group as many computational structures in the same configuration or shall we have several configurations?

show abstract

Modeling Loop Unrolling: Approaches and Open Issues

Cited by 12 publications

References 12 publications

Optimal Loop Unrolling and Shifting for Reconfigurable Architectures

Optimal Loop Unrolling and Shifting for Reconfigurable Architectures

Loop unrolling and shifting for reconfigurable architectures

On Estimations for Compiling Software to FPGA-based Systems

Contact Info

Product

Resources

About