Yung-Chia Lin scite author profile

Tang

et al. 2006

Abstract.Compiler is substantially regarded as the most essential component in the software toolchain to promote a successful processor design. This paper describes our preliminary employment of the Open Research Compiler (ORC) infrastructure on a novel VLIW DSP processor (known as PAC DSP core) and its specific compilation and optimization design. The PAC DSP processor exceedingly utilized port-restricted, distinct partitioned register file structures in addition to the heterogeneous clustered datapath architecture to attain low power consumption and reduced die size; however, these architectural features lend new challenges to the compiler construction. As part of an effort to deal with the challenges of efficient code generation for PAC DSP, the register allocation scheme developed in this work and other retargeting optimization phases are also presented. Results indicated that our compiler development for PAC DSP could gives an early estimation of architecture performance so that refinements of architectures are possible with the software feedbacks. Our experiences in designing the compiler support for heterogeneous VLIW DSP processors with irregular resource constraints may benefit those who have interests in the compiler construction for the similar architectures.

LC‐GRFA: global register file assignment with local consciousness for VLIW DSP processors with non‐uniform register files

Concurrency and Computation

You

et al. 2008

SUMMARYEmbedded processors developed within the past few years have employed novel hardware designs to reduce the ever-growing complexity, power dissipation, and die area. Although using a distributed register file architecture is considered to have less read/write ports than using traditional unified register file structures, it presents challenges in compilation techniques to generate efficient codes for such architectures. This paper presents a novel scheme for register allocation that includes global and local components on a VLIW DSP processor with distributed register files whose port access is highly restricted. In the scheme, an optimization phase performed prior to conventional global/local register allocation, named global/local register file assignment (RFA), is used to minimize various register file communication costs. A heuristic algorithm is proposed for global RFA to make suitable decisions based on local RFA. Experiments were performed by incorporating our schemes on a novel VLIW DSP processor with non-uniform register files. The results indicate that the compilation based on our proposed approach delivers significant performance improvements, compared with the solution without using our proposed global register allocation scheme.

Effective Code Generation for Distributed and Ping-Pong Register Files: A Case Study on PAC VLIW DSP Cores

J Sign Process Syst Sign Image

et al. 2007

Abstract. The compiler is generally regarded as the most important software component that supports a processor design to achieve success. This paper describes our application of the open research compiler infrastructure to a novel VLIW DSP (known as the PAC DSP core) and the specific design of code generation for its register file architecture. The PAC DSP utilizes port-restricted, distributed, and partitioned register file structures in addition to a heterogeneous clustered data-path architecture to attain low power consumption and a smaller die. As part of an effort to overcome the new challenges of code generation for the PAC DSP, we have developed a new register allocation scheme and other retargeting optimization phases that allow the effective generation of high quality code. Our preliminary experimental results indicate that our developed compiler can efficiently utilize the features of the specific register file architectures in the PAC DSP. Our experiences in designing compiler support for the PAC VLIW DSP with irregular resource constraints may also be of interest to those involved in developing compilers for similar architectures.

Compiler Optimizations with DSP-Specific Semantic Descriptions

Hwang

Lee

2005

Energy-aware scheduling and simulation methodologies for parallel security processors with multiple voltage domains

et al. 2007

Dynamic voltage scaling (DVS) and power gating (PG) have become mainstream technologies for low-power optimization in recent years. One issue that remains to be solved is integrating these techniques in correlated domains operating with multiple voltages. This article addresses the problem of power-aware task scheduling on a scalable cryptographic processor that is designed as a heterogeneous and distributed system-on-a-chip, with the aim of effectively integrating DVS, PG, and the scheduling of resources in multiple voltage domains (MVD) to achieve low energy consumption. Our approach uses an analytic model as the basis for estimating the performance and energy requirements between different domains and addressing the scheduling issues for correlated resources in systems. We also present the results of performance and energy simulations from transaction-level models of our security processors in a variety of system configurations. The prototype experiments show that our proposed methods yield significant energy reductions. The proposed techniques will be useful for implementing DVS and PG in domains with multiple correlated resources.