Clustering is an effective method to increase the available parallelism in VLIW datapaths without incurring severe penalties associated with a large number of register file ports. Efficient utilization of a clustered datapath requires careful binding/assignment of operations to clusters. The article proposes a binding algorithm that effectively explores trade-offs between in-cluster operation serialization and delays associated with data transfers between clusters. Extensive experimental evidence is provided showing that the algorithm generates high quality solutions for representative kernels, with up to 33% improvement over a state-of-the-art binding algorithm.
EMBEDDED SYSTEMS form a market that is already larger and growing more rapidly than that of general-purpose computers. In fact, realtime multimedia and signal processing embedded applications currently account for over 90% of all computer cycles. 8 Our focus in this article will be on an increasingly important set of embedded applications, consisting of portable systems in the areas of digital communications and multimedia consumer electronics (e.g., cellular phones, personal digital assistants, digital video cameras, and multimedia terminals). These complex systems rely on "power-hungry" algorithms for high-bandwidth wireless communications, video compression and decompression, handwriting recognition, speech and image processing, etc. The portability of these systems makes energy consumption a particularly critical design concern as it reduces battery life. Moreover, high power dissipation leads to more expensive packaging and decreases reliability. At the same time, levels of microelectronic integration continue to rise, enabling more integrat-
Abstract-Specialized clustered very large instruction word (VLIW) processors combined with effective compilation techniques enable aggressive exploitation of the high instruction-level parallelism inherent in many embedded media applications, while unlocking a variety of possible performance/cost tradeoffs. In this work, the authors propose a methodology to support early design space exploration of clustered VLIW datapaths, in the context of a specific target application. They argue that, due to the large size and complexity of the design space, the early design space exploration phase should consider only design space parameters that have a first-order impact on two key physical figures of merit: clock rate and power dissipation. These parameters were found to be: maximum cluster capacity, number of clusters, and bus (interconnect) capacity. Experimental validation of their design space exploration algorithm shows that a thorough exploration of the complex design space can be performed very efficiently in this abstract parameterized design space. Moreover, an empirical study carried out on a representative set of computation-intensive benchmarks suggests that "penalties" of clustered versus centralized datapaths are often minimal and that clustering indeed unlocks a variety of valuable design tradeoffs.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.