Mansureh S. Moghaddam scite author profile

Programmable hardware built on a regular architecture can partially alleviate the problem of increased defect densities associated with transistor scaling by dynamically wiring around the defects [1]. The fine granularity of FPGAs is however unsuitable for effectively exploiting runtime reconfiguration because of the high overheads involved. A coarse grain reconfigurable array with malleable communication links -reMORPH -is proposed in this paper. The compute tile uses DSP48E and BRAM embedded blocks in a Xilinx FPGA and has a very low footprint of about 200 slice LUTs. The semi-systolic near neighbour communication interconnect can be dynamically reconfigured for each "epoch" of computation. The "epoch" or phases of the application are obtained via profiling or static data flow analysis. Some of the links between the compute tiles are changed during the reconfiguration phase which drastically reduces the context switch overhead enabling high performance/area applications to be built on this fabric.

show abstract

Improving Energy Efficiency of Coarse-Grain Reconfigurable Arrays Through Modulo Schedule Compression/Decompression

Lee

Moghaddam

Suh

et al. 2018

ACM Trans. Archit. Code Optim.

View full text Add to dashboard Cite

Modulo-scheduled course-grain reconfigurable array (CGRA) processors excel at exploiting loop-level parallelism at a high performance per watt ratio. The frequent reconfiguration of the array, however, causes between 25% and 45% of the consumed chip energy to be spent on the instruction memory and fetches therefrom. This article presents a hardware/software codesign methodology for such architectures that is able to reduce both the size required to store the modulo-scheduled loops and the energy consumed by the instruction decode logic. The hardware modifications improve the spatial organization of a CGRA's execution plan by reorganizing the configuration memory into separate partitions based on a statistical analysis of code. A compiler technique optimizes the generated code in the temporal dimension by minimizing the number of signal changes. The optimizations achieve, on average, a reduction in code size of more than 63% and in energy consumed by the instruction decode logic by 70% for a wide variety of application domains. Decompression of the compressed loops can be performed in hardware with no additional latency, rendering the presented method ideal for low-power CGRAs running at high frequencies. The presented technique is orthogonal to dictionary-based compression schemes and can be combined to achieve a further reduction in code size. CCS Concepts: • Computer systems organization → Reconfigurable computing; • Hardware → Power estimation and optimization; • Software and its engineering → Compilers;

show abstract

A space- and energy-efficient code compression/decompression technique for coarse-grained reconfigurable architectures

Egger¹,

Lee²,

Kang³

et al. 2017

View full text Add to dashboard Cite

FPGA implementation of convolutional neural network based on stochastic computing

Kim

Moghaddam

Moradian

et al. 2017

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.