Stream Semantic Registers: A Lightweight RISC-V ISA Extension Achieving Full Compute Utilization in Single-Issue Cores

Schuiki, Fabian; Zaruba, Florian; Hoefler, Torsten; Benini, Luca

doi:10.1109/tc.2020.2987314

Cited by 26 publications

(20 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In order to store the intermediate data, like the modern processor architectures [15], [18], the ART-9 core also includes a ternary registerfile (TRF) including nine generalpurposed registers, each of which is accessed by using a 2trit value. Utilizing the load-store architecture used for typical RISC processors [19], there are four instruction categories in ART-9 ISA; R-type, I-type, B-type, and M-type.…”

Section: A Art-9 Instruction Set Architecturementioning

confidence: 99%

“…As shown in Fig. 4, similar to the lightweight RISC-type designs [19], there are five stages for fetching the instruction from TIM (IF), decoding the fetched instruction (ID), executing the arithmetic/logical operations (EX), accessing the TDM (MEM), and updating the result to TRF (WB). The ternary pipelined registers are newly developed to keep the results from each stage, making a balanced pipelined processing.…”

Section: B 5-stage Pipelined Art-9 Architecturementioning

confidence: 99%

See 1 more Smart Citation

Design and Evaluation Frameworks for Advanced RISC-based Ternary Processor

Kam¹,

Min²,

Yoon³

et al. 2021

Preprint

View full text Add to dashboard Cite

In this paper, we introduce the design and verification frameworks for developing a fully-functional emerging ternary processor. Based on the existing compiling environments for binary processors, for the given ternary instructions, the software-level framework provides an efficient way to convert the given programs to the ternary assembly codes. We also present a hardware-level framework to rapidly evaluate the performance of a ternary processor implemented in arbitrary design technology. As a case study, the fully-functional 9-trit advanced RISC-based ternary (ART-9) core is newly developed by using the proposed frameworks. Utilizing 24 custom ternary instructions, the 5-stage ART-9 prototype architecture is successfully verified by a number of test programs including dhrystone benchmark in a ternary domain, achieving the processing efficiency of 57.8 DMIPS/W and 3.06 × 10 6 DMIPS/W in the FPGA-level ternary-logic emulations and the emerging CNTFET ternary gates, respectively.

show abstract

Section: A Art-9 Instruction Set Architecturementioning

confidence: 99%

Section: B 5-stage Pipelined Art-9 Architecturementioning

confidence: 99%

Design and Evaluation Frameworks for Advanced RISC-based Ternary Processor

Kam¹,

Min²,

Yoon³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Aiming to maximize the compute/control ratio (making the FPU the dominant part of the design) mitigating the effects of deep pipelines and dynamic scheduling. 2) An ISA extension, originally proposed by Schuiki et al [18], called stream semantic register (SSR). This extension accelerates data-oblivious [19] problems by providing an efficient semantic to read and write from memory.…”

Section: Contributionsmentioning

confidence: 99%

“…Achieving 3.5× more energy efficiency and 4.5× better FPU utilization on small matrices than the current state of the art. 2) An implementation of the SSR [18] enhanced with shadow registers to allow overlapping loop-setup with ongoing operations using the FREP extension enabling the usage of our SSR and FREP extensions on more irregular kernels such as Fast Fourier Transform (FFT). Achieving speed-ups of 4.7× in the single-core case and close to 3× in the parallel octa-core case for the FFT benchmark.…”

Section: Contributionsmentioning

confidence: 99%

See 1 more Smart Citation

Snitch: A tiny Pseudo Dual-Issue Processor for Area and Energy Efficient Execution of Floating-Point Intensive Workloads

Zaruba,

Schuiki,

Hoefler

et al. 2020

Preprint

Self Cite

View full text Add to dashboard Cite

Data-parallel applications, such as data analytics, machine learning, and scientific computing, are placing an ever-growing demand on floating-point operations per second on emerging systems. With increasing integration density, the quest for energy efficiency becomes the number one design concern. While dedicated accelerators provide high energy efficiency, they are over-specialized and hard to adjust to algorithmic changes. We propose an architectural concept that tackles the issues of achieving extreme energy efficiency while still maintaining high flexibility as a general-purpose compute engine. The key idea is to pair a tiny 10 kGE control core, called Snitch, with a double-precision FPU to adjust the compute to control ratio. While traditionally minimizing non-FPU area and achieving high floating-point utilization has been a trade-off, with Snitch, we achieve them both, by enhancing the ISA with two minimally intrusive extensions: stream semantic registers (SSR) and a floating-point repetition instruction (FREP). SSRs allow the core to implicitly encode load/store instructions as register reads/writes, eliding many explicit memory instructions. The FREP extension decouples the floating-point and integer pipeline by sequencing instructions from a micro-loop buffer. These ISA extensions significantly reduce the pressure on the core and free it up for other tasks, making Snitch and FPU effectively dual-issue at a minimal incremental cost of 3.2%. The two low overhead ISA extensions make Snitch more flexible than a contemporary vector processor lane, achieving a 2× energy-efficiency improvement. We have evaluated the proposed core and ISA extensions on an octa-core cluster in 22 nm technology. We achieve more than 5× multi-core speed-up and a 3.5× gain in energy efficiency on several parallel microkernels.

show abstract

TCADer: A Tightly Coupled Accelerator Design framework for heterogeneous system with hardware/software co-design

Liu

Xiao

et al. 2023

Journal of Systems Architecture

View full text Add to dashboard Cite

Stream Semantic Registers: A Lightweight RISC-V ISA Extension Achieving Full Compute Utilization in Single-Issue Cores

Cited by 26 publications

References 29 publications

Design and Evaluation Frameworks for Advanced RISC-based Ternary Processor

Design and Evaluation Frameworks for Advanced RISC-based Ternary Processor

Snitch: A tiny Pseudo Dual-Issue Processor for Area and Energy Efficient Execution of Floating-Point Intensive Workloads

TCADer: A Tightly Coupled Accelerator Design framework for heterogeneous system with hardware/software co-design

Contact Info

Product

Resources

About