With the increasing demand for high performance computing in application domains with stringent power budgets, coarse-grained reconfigurable array (CGRA) architectures have become a popular choice among researchers and manufacturers. Loops are the hot-spots of kernels running on CGRAs and hence several techniques have been devised to optimize the loop execution. However, works in this direction are predominantly software-based solutions. This paper addresses the optimization opportunities at a deeper level and introduces a hardware based loop control mechanism that can support arbitrarily nested loops up to four levels. Major contributions of this work are, a lightweight Hardware Loop Block (HLB) for CGRAs that eliminates control instruction overhead of loops and an acyclic graph transformation that removes loop branches from the application CDFG. When tested on a set of kernels chosen from various application domains, the design could achieve a maximum of 1.9× and an average of 1.5× speed-up against the conventional approach. The total number of instructions executed is reduced to half for almost all the kernels with an area and power consumption overhead of 2.6% and 0.8% respectively.
anced performance, energy efficiency and flexibility bestowed surging popularity on Coarse Grained Reconfigurable Array (CGRA) architectures. To further improve the performance and energy efficiency, several hardware and softwarebased loop optimizations are adopted for CGRAs. In this paper, we propose a centralized hardware-based loop optimization technique to achieve better area and energy results compared to the previously implemented distributed version. Without incurring any performance degradation, area overhead against the reference architecture is reduced down to 1.5% for a 4×2 CGRA configuration. A maximum of 47.3% and an arithmetic mean of 27.2% reduction in energy consumption is attained by the centralized version of hardware loop compared to the baseline model employing software loop. Furthermore, the paper explores the co-existence of CGRA-specific hardware and software optimizations and their impact on loop efficiencies. Enhanced results are obtained by coupling loop unrolling with centralized hardware loop support. The combination allows achieving up to 68.7% reduction in energy consumption and 5.46× speed-up against the baseline model with no optimizations applied.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.