A shared reconfigurable VLIW multiprocessor system

Anjam, Fakhar; Wong, Stephan; Nadeem, Faisal

doi:10.1109/ipdpsw.2010.5470734

Cited by 3 publications

(4 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A promising approach by using FPGA specific features to solve the multiport RF challenge is shown in [4]. This work also shows a way to use resource sharing in multiprocessor settings, where control logic supports the use of functional logic in different data paths [5]. Compared to our architecture, the biggest bottlenecks are the restricted memory interface and the central RF.…”

Section: Related Workmentioning

confidence: 98%

SynZEN: A hybrid TTA/VLIW architecture with a distributed register file

2012

View full text Add to dashboard Cite

SynZENa hybrid TTA/VLIW architecture with a distributed register file Conference object, Postprint versionThis version is available at http://dx.doi.org/10.14279/depositonce-5743. Suggested CitationHauser, Stefan; Moser, Nico; Juurlink, Ben: SynZEN: a hybrid TTA/VLIW architecture with a distributed register file. Abstract-The quest for higher performance within a certain power budget in the fields of embedded computing demands unconventional architectural approaches. To this end, in this paper we present synZEN (sZ): a (micro-)architecture that combines features of very long instruction word (VLIW) and transport triggered architectures (TTAs) to cover the needs of different applications. SynZEN features a distributed register file (RF) (i.e., each functional unit (FU) has its own RF) and a wide memory connection to exploit spatial data locality. FPGA synthesis results demonstrate that due to the distributed RF the sZ design can be implemented in less area (in terms of FPGA slices) than existing TTA and VLIW designs. Furthermore, using two micro-benchmarks we show that because of the wide memory connection, sZ outperforms both the TTA as well as the VLIW design. I. INTRODUCTIONWe present a processor architecture, called synZEN (sZ), which was initially designed as an application-specific processor. The design goals of this architecture are to utilize inherent parallelism, provide flexibility, and to be scalable. Therefore our architecture combines features of two well known architectural concepts. We combine the powerful VLIW FUs with a flexible interconnection network (ICN) based on the TTA concept.This paper focuses on presenting the micro-architecture and outlines other engineering aspects necessary to understand the architecture. Furthermore, we compare our approach to a VLIW architecture and a TTA implementation. The main contributions of this paper can be summarized as follows:• We present an architecture which has the potential of extracting high inherent instruction level parallelism (ILP) of applications. Due to the very good architecture scalability the extractable ILP scales too.• sZ features a wide memory interface to the data memory, which allows parallel access to consecutively stored data which supports the parallelism of the architecture.• With the distributed RF we significantly increase the scalability and decrease the overall costs while avoiding reducing the number of registers each FU has access to.• We evaluate this hybrid architecture by comparing its performance and cost to similar architectures. It is shown that the combination of architectural features of different architectures result in higher performance as well as in lower resource consumption. We also show that the resource consumption of our local registers increases only linearly with the increasing number of FUs in contrast to the quickly increasing cost of architectures with a central RF.

show abstract

Section: Related Workmentioning

confidence: 98%

SynZEN: A hybrid TTA/VLIW architecture with a distributed register file

2012

View full text Add to dashboard Cite

show abstract

“…The most important feature of the SC-SS method is that the superscalar architecture used in it enables the parallel execution of many simple arithmetic operations, controlled by the very-long instruction word (VLIW). Examples of solutions that can be classified to the SC-SS method are presented, for example, in Reference [19][20][21]. Dedicated signal processors are also designed based on the super-scalar VLIW architecture, e.g., Refs.…”

Section: Different Ways Of Algorithms Implementation On Fpgamentioning

confidence: 99%

Fixed-Point Arithmetic Unit with a Scaling Mechanism for FPGA-Based Embedded Systems

Przybył¹

2021

Electronics

View full text Add to dashboard Cite

The work describes the new architecture of a fixed-point arithmetic unit. It is based on the use of integer arithmetic operations for which the information about the scale of the processed numbers is contained in the binary code of the arithmetic instruction being executed. Therefore, this approach is different from the typical way of implementing fixed-point operations on standard processors. The presented solution is also significantly different from the one used in floating-point arithmetic, as the decision to determine the appropriate scale is made at the stage of compiling the code and not during its execution. As a result, the real-time processing of real numbers is simplified and, therefore, faster. The described method provides a better ratio of the processing efficiency to the complexity of the digital system than other methods. In particular, the advantage of using the described method in FPGA-based embedded control systems should be indicated. Experimental tests on an industrial servo-drive confirm the correctness of the described solution.

show abstract

“…Similarly, Anjam et al presented in [5] a VLIW-based dual-processor system that shares a single execution unit amongst the two CPUs. That system implements a resource controller that time-shares the VLIW execution unit.…”

Section: Introductionmentioning

confidence: 99%

“…The rationale behind the described approach is that by having a shared execution unit, less resources and energy are required. However, that system [5] is completely static as it cannot be modified at run-time.…”

Section: Introductionmentioning

confidence: 99%

A Soft Dual-Processor System with a Partially Run-Time Reconfigurable Shared 128-Bit SIMD Engine

Ordaz

Koch

2018

2018 IEEE 29th International Conference on Application-Specific Systems, Architectures and Processors (ASAP)

View full text Add to dashboard Cite

In this work, we present a soft dual-processor system that, as a distinctive feature, seamlessly integrates a partially run-time reconfigurable 128-bit SIMD engine. Importantly, the SIMD engine is tightly coupled to both scalar CPUs and it is shared amongst them with the purpose of drastically improving overall area utilization. We show that the proposed SIMD engine increases performance-per-area and that it can be used to substantially accelerate time consuming kernels for a set of media applications.

show abstract

A shared reconfigurable VLIW multiprocessor system

Cited by 3 publications

References 15 publications

SynZEN: A hybrid TTA/VLIW architecture with a distributed register file

SynZEN: A hybrid TTA/VLIW architecture with a distributed register file

Fixed-Point Arithmetic Unit with a Scaling Mechanism for FPGA-Based Embedded Systems

A Soft Dual-Processor System with a Partially Run-Time Reconfigurable Shared 128-Bit SIMD Engine

Contact Info

Product

Resources

About