SynZENa hybrid TTA/VLIW architecture with a distributed register file
Conference object, Postprint versionThis version is available at http://dx.doi.org/10.14279/depositonce-5743.
Suggested CitationHauser, Stefan; Moser, Nico; Juurlink, Ben: SynZEN: a hybrid TTA/VLIW architecture with a distributed register file. Abstract-The quest for higher performance within a certain power budget in the fields of embedded computing demands unconventional architectural approaches. To this end, in this paper we present synZEN (sZ): a (micro-)architecture that combines features of very long instruction word (VLIW) and transport triggered architectures (TTAs) to cover the needs of different applications. SynZEN features a distributed register file (RF) (i.e., each functional unit (FU) has its own RF) and a wide memory connection to exploit spatial data locality. FPGA synthesis results demonstrate that due to the distributed RF the sZ design can be implemented in less area (in terms of FPGA slices) than existing TTA and VLIW designs. Furthermore, using two micro-benchmarks we show that because of the wide memory connection, sZ outperforms both the TTA as well as the VLIW design.
I. INTRODUCTIONWe present a processor architecture, called synZEN (sZ), which was initially designed as an application-specific processor. The design goals of this architecture are to utilize inherent parallelism, provide flexibility, and to be scalable. Therefore our architecture combines features of two well known architectural concepts. We combine the powerful VLIW FUs with a flexible interconnection network (ICN) based on the TTA concept.This paper focuses on presenting the micro-architecture and outlines other engineering aspects necessary to understand the architecture. Furthermore, we compare our approach to a VLIW architecture and a TTA implementation. The main contributions of this paper can be summarized as follows:• We present an architecture which has the potential of extracting high inherent instruction level parallelism (ILP) of applications. Due to the very good architecture scalability the extractable ILP scales too.• sZ features a wide memory interface to the data memory, which allows parallel access to consecutively stored data which supports the parallelism of the architecture.• With the distributed RF we significantly increase the scalability and decrease the overall costs while avoiding reducing the number of registers each FU has access to.• We evaluate this hybrid architecture by comparing its performance and cost to similar architectures. It is shown that the combination of architectural features of different architectures result in higher performance as well as in lower resource consumption. We also show that the resource consumption of our local registers increases only linearly with the increasing number of FUs in contrast to the quickly increasing cost of architectures with a central RF.