Impact on performance of fused multiply-add units in aggressive VLIW architectures

López, David; Llosa, Josep; Ayguadé, Eduard; Valero, Mateo

doi:10.1109/icpp.1999.797384

Cited by 2 publications

(3 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The paper targets numerical applications based on FP operations. Here, we extend and improve the previous research work done in [22], [23], and [24]. In this evaluation, we take into account the individual impact of the static scheduler, register file size, area, and cycle time.…”

Section: Introductionmentioning

confidence: 88%

See 1 more Smart Citation

Cost-conscious strategies to increase performance of numerical programs on aggressive VLIW architectures

López

Llosa

Valero

et al. 2001

IEEE Trans. Comput.

Self Cite

View full text Add to dashboard Cite

ÐLoops are the main time-consuming part of numerical applications. The performance of the loops is limited either by the resources offered by the architecture or by recurrences in the computation. To execute more operations per cycle, current processors are designed with growing degrees of resource replication (replication technique) for memory ports and functional units. However, the high cost in terms of area and cycle time of this technique precludes the use of high degrees of replication. High values for the cycle time may clearly offset any gain in terms of number of execution cycles. High values for the area may lead to an unimplementable configuration. An alternative to resource replication is resource widening (widening technique), which has also been used in some recent designs in which the width of the resources is increased (i.e., a single operation is performed over multiple data). Moreover, several general-purpose superscalar microprocessors have been implemented with multiply-add fused floating-point units (fusion technique), which reduces the latency of the combined operation and the number of resources used. In this paper, we evaluate a broad set of VLIW processor design alternatives that combine the three techniques. We perform a technological projection for the next processor generations in order to foresee the possible implementable alternatives. From this study, we conclude that if the cost is taken into account, combining certain degrees of replication and widening in the hardware resources is more effective than applying only replication. Also, we confirm that multiply-add fused units will have a significant impact in raising the performance of future processors architectures with a reasonable increase in cost. Index TermsÐVLIW processors, instruction level parallelism, software pipelining, numerical applications, performance/cost trade-off.

show abstract

Section: Introductionmentioning

confidence: 88%

“…On the other hand, using FMA functional units can reduce the MII of some loops. It also reduces the need for spill code (because no register is required to store the intermediate result) and reduces the complexity of the scheduled graph, increasing the likelihood of the scheduler finding an optimal schedule [22].…”

Section: Fusionmentioning

confidence: 99%

Cost-conscious strategies to increase performance of numerical programs on aggressive VLIW architectures

López

Llosa

Valero

et al. 2001

IEEE Trans. Comput.

Self Cite

View full text Add to dashboard Cite

show abstract

“…However, the cost of these changes in the FPUs and ALUs is limited (as, for instance, the area of these units is dominated by the multiplier). The biggest cost is instead on the complexity of the register file, which is required to provide more operators to the functional units [52].…”

Section: Risc-v Vector Processorsmentioning

confidence: 99%

On-Board Decision Making in Space with Deep Neural Networks and RISC-V Vector Processors

Mascio¹,

Menicucci²,

Gill³

et al. 2021

Journal of Aerospace Information Systems

View full text Add to dashboard Cite

The use of deep neural networks (DNNs) in terrestrial applications went from niche to widespread in a few years, thanks to relatively inexpensive hardware for both training and inference, and large datasets available. The applicability of this paradigm to space systems, where both large datasets and inexpensive hardware are not readily available, is more difficult and thus still rare. This paper analyzes the impact of DNNs on the system-level capabilities of space systems in terms of on-board decision making (OBDM) and identifies the specific criticalities of deploying DNNs on satellites. The workload of DNNs for on-board image and telemetry analysis is analyzed, and the results are used to drive the preliminary design of a RISC-V vector processor to be employed as a generic platform to enable energy-efficient OBDM for both payload and platform applications. The design of the memory subsystem is carried out in detail to allow full exploitation of the computational resources in typically resource-constrained space systems.

show abstract

Impact on performance of fused multiply-add units in aggressive VLIW architectures

Abstract: Abstract

Cited by 2 publications

References 29 publications

Cost-conscious strategies to increase performance of numerical programs on aggressive VLIW architectures

Cost-conscious strategies to increase performance of numerical programs on aggressive VLIW architectures

On-Board Decision Making in Space with Deep Neural Networks and RISC-V Vector Processors

Contact Info

Product

Resources

About