Abstract.Compiler is substantially regarded as the most essential component in the software toolchain to promote a successful processor design. This paper describes our preliminary employment of the Open Research Compiler (ORC) infrastructure on a novel VLIW DSP processor (known as PAC DSP core) and its specific compilation and optimization design. The PAC DSP processor exceedingly utilized port-restricted, distinct partitioned register file structures in addition to the heterogeneous clustered datapath architecture to attain low power consumption and reduced die size; however, these architectural features lend new challenges to the compiler construction. As part of an effort to deal with the challenges of efficient code generation for PAC DSP, the register allocation scheme developed in this work and other retargeting optimization phases are also presented. Results indicated that our compiler development for PAC DSP could gives an early estimation of architecture performance so that refinements of architectures are possible with the software feedbacks. Our experiences in designing the compiler support for heterogeneous VLIW DSP processors with irregular resource constraints may benefit those who have interests in the compiler construction for the similar architectures.
SUMMARYEmbedded processors developed within the past few years have employed novel hardware designs to reduce the ever-growing complexity, power dissipation, and die area. Although using a distributed register file architecture is considered to have less read/write ports than using traditional unified register file structures, it presents challenges in compilation techniques to generate efficient codes for such architectures. This paper presents a novel scheme for register allocation that includes global and local components on a VLIW DSP processor with distributed register files whose port access is highly restricted. In the scheme, an optimization phase performed prior to conventional global/local register allocation, named global/local register file assignment (RFA), is used to minimize various register file communication costs. A heuristic algorithm is proposed for global RFA to make suitable decisions based on local RFA. Experiments were performed by incorporating our schemes on a novel VLIW DSP processor with non-uniform register files. The results indicate that the compilation based on our proposed approach delivers significant performance improvements, compared with the solution without using our proposed global register allocation scheme.
Abstract. The compiler is generally regarded as the most important software component that supports a processor design to achieve success. This paper describes our application of the open research compiler infrastructure to a novel VLIW DSP (known as the PAC DSP core) and the specific design of code generation for its register file architecture. The PAC DSP utilizes port-restricted, distributed, and partitioned register file structures in addition to a heterogeneous clustered data-path architecture to attain low power consumption and a smaller die. As part of an effort to overcome the new challenges of code generation for the PAC DSP, we have developed a new register allocation scheme and other retargeting optimization phases that allow the effective generation of high quality code. Our preliminary experimental results indicate that our developed compiler can efficiently utilize the features of the specific register file architectures in the PAC DSP. Our experiences in designing compiler support for the PAC VLIW DSP with irregular resource constraints may also be of interest to those involved in developing compilers for similar architectures.
Dynamic voltage scaling (DVS) and power gating (PG) have become mainstream technologies for low-power optimization in recent years. One issue that remains to be solved is integrating these techniques in correlated domains operating with multiple voltages. This article addresses the problem of power-aware task scheduling on a scalable cryptographic processor that is designed as a heterogeneous and distributed system-on-a-chip, with the aim of effectively integrating DVS, PG, and the scheduling of resources in multiple voltage domains (MVD) to achieve low energy consumption. Our approach uses an analytic model as the basis for estimating the performance and energy requirements between different domains and addressing the scheduling issues for correlated resources in systems. We also present the results of performance and energy simulations from transaction-level models of our security processors in a variety of system configurations. The prototype experiments show that our proposed methods yield significant energy reductions. The proposed techniques will be useful for implementing DVS and PG in domains with multiple correlated resources.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations鈥揷itations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright 漏 2024 scite LLC. All rights reserved.
Made with 馃挋 for researchers
Part of the Research Solutions Family.