PERCIVAL: Open-Source Posit RISC-V Core With Quire Capability

Mallasén, David; Murillo, Raúl; Barrio, Alberto A. Del; Botella, Guillermo; Piñuel, Luis; Prieto-Matías, Manuel

doi:10.1109/arith54963.2022.00019

Cited by 3 publications

(2 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Additionally, the posit format does not include redundancy representations or overflow/underflow cases during operations [50]. Some studies [51][52][53][54] have confirmed that, for some applications, like convolutional neural networks or CNNs, the posit format outperforms the FP one in terms of accuracy. Posit achieves superior accuracy in the range near 1.0, where most computations occur, making it very attractive for deep learning applications.…”

Section: Posit Formatmentioning

confidence: 99%

Exploring Hardware Fault Impacts on Different Real Number Representations of the Structural Resilience of TCUs in GPUs

Limas Sierra,

Guerrero-Balaguera,

Condia

et al. 2024

Electronics

View full text Add to dashboard Cite

The most recent generations of graphics processing units (GPUs) boost the execution of convolutional operations required by machine learning applications by resorting to specialized and efficient in-chip accelerators (Tensor Core Units or TCUs) that operate on matrix multiplication tiles. Unfortunately, modern cutting-edge semiconductor technologies are increasingly prone to hardware defects, and the trend to highly stress TCUs during the execution of safety-critical and high-performance computing (HPC) applications increases the likelihood of TCUs producing different kinds of failures. In fact, the intrinsic resiliency to hardware faults of arithmetic units plays a crucial role in safety-critical applications using GPUs (e.g., in automotive, space, and autonomous robotics). Recently, new arithmetic formats have been proposed, particularly those suited to neural network execution. However, the reliability characterization of TCUs supporting different arithmetic formats was still lacking. In this work, we quantitatively assessed the impact of hardware faults in TCU structures while employing two distinct formats (floating-point and posit) and using two different configurations (16 and 32 bits) to represent real numbers. For the experimental evaluation, we resorted to an architectural description of a TCU core (PyOpenTCU) and performed 120 fault simulation campaigns, injecting around 200,000 faults per campaign and requiring around 32 days of computation. Our results demonstrate that the posit format of TCUs is less affected by faults than the floating-point one (by up to three orders of magnitude for 16 bits and up to twenty orders for 32 bits). We also identified the most sensible fault locations (i.e., those that produce the largest errors), thus paving the way to adopting smart hardening solutions.

show abstract

Section: Posit Formatmentioning

confidence: 99%

Exploring Hardware Fault Impacts on Different Real Number Representations of the Structural Resilience of TCUs in GPUs

Limas Sierra,

Guerrero-Balaguera,

Condia

et al. 2024

Electronics

View full text Add to dashboard Cite

show abstract

“…To address IEEE-754 standard limitations, new computer formats offer different trade-offs, such as Bfloat16 [2], [34], Tapered Floating-Point (TFP) [46], Posit [24], and FP8-E4M3 and FP8alt-E5M2 formats [45]. Studies compare these formats in terms of circuit area and numerical stability [3], [10], [11], [16], [31], [42], [44], [53], [56], [62].…”

Section: Introductionmentioning

confidence: 99%

An Open-Source Framework for Efficient Numerically-Tailored Computations

Ledoux,

Casas

2023

2023 33rd International Conference on Field-Programmable Logic and Applications (FPL)

View full text Add to dashboard Cite

We present a versatile open-source framework designed to facilitate efficient, numerically-tailored Matrix-Matrix Multiplications (MMMs). The framework offers two primary contributions: first, a fine-tuned, automated pipeline for arithmetic datapath generation, enabling highly customizable systolic MMM kernels; second, seamless integration of the generated kernels into user code, irrespective of the programming language employed, without necessitating modifications.We employ this framework within a cutting-edge platform, comprising a Power9 host, an OpenCAPI link, and a Xilinx Virtex UltraScale+ FPGA. The framework demonstrates a systematic enhancement in accuracy per energy cost across diverse High Performance Computing (HPC) workloads displaying a variety of numerical requirements, such as Artificial Intelligence (AI) inference and Sea Surface Height (SSH) computation. For AI inference, we consider a set of state-of-the-art neural network models, namely ResNet18, ResNet34, ResNet50, DenseNet121, DenseNet161, DenseNet169, and VGG11, in conjunction with two datasets, two computer formats, and 27 distinct intermediate arithmetic datapaths. Our approach consistently reduces energy consumption across all cases, with a notable example being the reduction by factors of 3.3× for IEEE754-32 and 1.4× for Bfloat16 during ImageNet inference with ResNet50. This is accomplished while maintaining accuracies of 82.3% and 86%, comparable to those achieved with conventional Floating-Point Units (FPUs). In the context of SSH computation, our method achieves fully-reproducible results using double-precision words, surpassing the accuracy of conventional double-and quad-precision arithmetic in FPUs. Our approach enhances SSH computation accuracy by a minimum of 5× and 27× compared to IEEE754-64 and IEEE754-128, respectively, resulting in 5.6× and 15.1× improvements in accuracy per power cost.

show abstract