Dongkwan Suh scite author profile

Modulo-scheduled course-grain reconfigurable array (CGRA) processors excel at exploiting loop-level parallelism at a high performance per watt ratio. The frequent reconfiguration of the array, however, causes between 25% and 45% of the consumed chip energy to be spent on the instruction memory and fetches therefrom. This article presents a hardware/software codesign methodology for such architectures that is able to reduce both the size required to store the modulo-scheduled loops and the energy consumed by the instruction decode logic. The hardware modifications improve the spatial organization of a CGRA's execution plan by reorganizing the configuration memory into separate partitions based on a statistical analysis of code. A compiler technique optimizes the generated code in the temporal dimension by minimizing the number of signal changes. The optimizations achieve, on average, a reduction in code size of more than 63% and in energy consumed by the instruction decode logic by 70% for a wide variety of application domains. Decompression of the compressed loops can be performed in hardware with no additional latency, rendering the presented method ideal for low-power CGRAs running at high frequencies. The presented technique is orthogonal to dictionary-based compression schemes and can be combined to achieve a further reduction in code size. CCS Concepts: • Computer systems organization → Reconfigurable computing; • Hardware → Power estimation and optimization; • Software and its engineering → Compilers;

show abstract

Nop compression scheme for high speed DSPs based on VLIW architecture

Jin

Ahn

Yoo

et al. 2014

View full text Add to dashboard Cite

VLIW (Very Long Instruction Word) is one of the most popular architectures in embedded systems because it has features of low power consumption and low hardware cost. Due to the nature of VLIW architecture such as bundled instructions and large register files, VLIW processors are running with large size of instruction codes in relatively low clock frequency. However compact instruction size and high clock frequency are the most important requirements of modern embedded consumer electronics. In this paper we propose a novel instruction compression scheme to solve the addressed problem. The experiment shows that the proposed scheme can reduce instruction size by 23% and improve clock frequency by 25% in average comparing with conventional compression schemes.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Dongkwan Suh

Design space exploration and implementation of a high performance and low area Coarse Grained Reconfigurable Processor

Improving Energy Efficiency of Coarse-Grain Reconfigurable Arrays Through Modulo Schedule Compression/Decompression

Nop compression scheme for high speed DSPs based on VLIW architecture

Contact Info

Product

Resources

About