Elena Moscu Panainte scite author profile

Abstract-In this paper, we present a polymorphic processor paradigm incorporating both general purpose and custom computing processing. The proposal incorporates an arbitrary number of programmable units, exposes the hardware to the programmers/ designers, and allows them to modify and extend the processor functionality at will. To achieve the previously stated attributes, we present a new programming paradigm, a new instruction set architecture, a microcode-based microarchitecture, and a compiler methodology. The programming paradigm, in contrast with the conventional programming paradigms, allows general-purpose conventional code and hardware descriptions to coexist in a program. In our proposal, for a given instruction set architecture, a onetime instruction set extension of eight instructions is sufficient to implement the reconfigurable functionality of the processor. We propose a microarchitecture based on reconfigurable hardware emulation to allow high-speed reconfiguration and execution. To prove the viability of the proposal, we experimented with the MPEG-2 encoder and decoder and a Xilinx Virtex II Pro FPGA. We have implemented three operations, SAD, DCT, and IDCT. The overall attainable application speedup for the MPEG-2 encoder and decoder is between 2.64-3.18 and between 1.56-1.94, respectively, representing between 93 percent and 98 percent of the theoretically obtainable speedups.

show abstract

Automatic selection of application-specific instruction-set extensions

Galuzzi

Panainte

Yankova

et al. 2006

View full text Add to dashboard Cite

In this paper, we present a general and an efficient algorithm for automatic selection of new application-specific instructions under hardware resources constraints. The instruction selection is formulated as an ILP problem and efficient solvers can be used for finding the optimal solution. An important feature of our algorithm is that it is not restricted to basic-block level nor does it impose any limitation on the number of the newly added instructions or on the number of the inputs/outputs of these instructions. The presented results show that a significant overall application speedup is achieved even for large kernels (for ADPCM decoder the speedup ranges from x1.2 to x3.7) and that our algorithm compares well with other state-of-art algorithms for automatic instruction set extensions.

show abstract

MORPHEUS: Heterogeneous Reconfigurable Computing

Thoma

Kühnle

Bonnot

et al. 2007

View full text Add to dashboard Cite

The Molen compiler for reconfigurable processors

Panainte

Bertels

Vassiliadis

2007

ACM Trans. Embed. Comput. Syst.

View full text Add to dashboard Cite

In this paper, we describe the compiler developed to target the Molen reconfigurable processor and programming paradigm. The compiler automatically generates optimized binary code for C applications, based on pragma annotation of the code executed on the reconfigurable hardware. For the IBM PowerPC 405 processor included in the Virtex II Pro platform FPGA, we implemented code generation, register, and stack frame allocation following the PowerPC EABI (embedded application binary interface). The PowerPC backend has been extended to generate the appropriate instructions for the reconfigurable hardware and data transfer, taking into account the information of the specific hardware implementations and system. Starting with an annotated C application, a complete design flow has been integrated to generate the executable bitstream for the reconfigurable processor. The flexible design of the proposed infrastructure allows to consider the special features of the reconfigurable architectures. In order to hide the reconfiguration latencies, we implemented an instruction-scheduling algorithm for the dynamic hardware configuration instructions. The algorithm schedules, in advance, the hardware configuration instructions, taking into account the conflicts for the reconfigurable hardware resources (FPGA area) between the hardware operations. To verify the Molen compiler, we used the multimedia video frame M-JPEG encoder of which the extended discrete cosine transform (DCT*) function was mapped on the FPGA. We obtained an overall speedup of 2.5 (about 84% efficiency over the maximal theoretical speedup of 2.96). The performance efficiency is achieved using automatically generated nonoptimized DCT* hardware implementation. The instruction-scheduling algorithm has been tested for DCT, quantization, and VLC operations. Based on simulation results, we determine that, while a simple scheduling produces a significant performance decrease, our proposed scheduling contributes for up to 16x M-JPEG encoder speedup.

show abstract

Instruction Scheduling for Dynamic Hardware Configurations

Panainte

Bertels

Vassiliadis

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.