Design and Implementation of a Fully Asynchronous SFQ Microprocessor: SCRAM2

Nobumori, Y.; Nishigai, T.; Nakamiya, Kazunori; Yoshikawa, Nobuyuki; Tanaka, Masamitsu; Terai, Hirotaka; Yorozu, Shinichi

doi:10.1109/tasc.2007.898658

Cited by 30 publications

(5 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Furthermore, in random number generation using SFQ circuits, the quality of the generated random number train depends on the frequency stability of the input SFQ pulse train [14][15][16]. There are many reports on the applications of multiple on-chip CGs to large-scale SFQ circuit systems such as asynchronous handshaking systems [17,18], the SFQ microprocessors [19,20], and the ERSFQ ALU [21]. In a large-scale SFQ circuit system employing a multichip module [22,23], it is reasonable to implement a clock generation circuit on each chip.…”

Section: Introductionmentioning

confidence: 99%

Frequency synchronization of single flux quantum oscillators

Yamanashi

Kinoshita

Yoshikawa

2021

Supercond. Sci. Technol.

View full text Add to dashboard Cite

We demonstrate the frequency synchronization of multiple single-flux quantum (SFQ) oscillators with different oscillation frequencies. To synchronize these SFQ oscillators, a common constant bias current is supplied to the SFQ oscillators without any bias resistors. When an SFQ oscillator oscillates at a frequency of f, the average voltage across the Josephson junction comprising the SFQ oscillator is f Φ 0 , where Φ 0 is the flux quantum in the superconductor. The bias currents supplied to the SFQ oscillators are redistributed to eliminate the average voltage difference output from the SFQ oscillators. As a result, the oscillation frequencies of all the SFQ oscillators are synchronized. Simulation results indicate that SFQ oscillators with an oscillation frequency difference of more than 50 GHz can be synchronized. We experimentally demonstrate the frequency synchronization of two SFQ oscillators composed of circular Josephson transmission lines. Frequency synchronization is expected to contribute toward the development of a low-power stable clock source stabilizing SFQ circuit operation.

show abstract

Section: Introductionmentioning

confidence: 99%

Frequency synchronization of single flux quantum oscillators

Yamanashi

Kinoshita

Yoshikawa

2021

Supercond. Sci. Technol.

View full text Add to dashboard Cite

show abstract

“…The bit-serial designs have the lowest complexity; however, their latencies increase linearly with the operand lengths, hardly making them competitive for implementation in 32-/64-bit processors [17], [18]. Bit-serial ALUs were used in 8-bit RSFQ microprocessors [19]- [24], in which an 8 times faster internal clock is still feasible. As an example, an 80 GHz bitserial ALU was reported in [25].…”

Section: Introductionmentioning

confidence: 99%

ERSFQ 8-Bit Parallel Arithmetic Logic Unit

Kirichenko

Vernik

Kamkar

et al. 2019

IEEE Trans. Appl. Supercond.

View full text Add to dashboard Cite

We have designed and tested a parallel 8-bit ERSFQ arithmetic logic unit (ALU). The ALU design employs wavepipelined instruction execution and features modular bit-slice architecture that is easily extendable to any number of bits and adaptable to current recycling. A carry signal synchronized with an asynchronous instruction propagation provides the wavepipeline operation of the ALU. The ALU instruction set consists of 14 arithmetical and logical instructions. It has been designed and simulated for operation up to a 10 GHz clock rate at the 10-kA/cm 2 fabrication process. The ALU is embedded into a shift-registerbased high-frequency testbed with on-chip clock generator to allow for comprehensive high frequency testing for all possible operands. The 8-bit ERSFQ ALU, comprising 6840 Josephson junctions, has been fabricated with MIT Lincoln Lab's 10-kA/cm 2 SFQ5ee fabrication process featuring eight Nb wiring layers and a high-kinetic inductance layer needed for ERSFQ technology. We evaluated the bias margins for all instructions and various operands at both low and high frequency clock. At low frequency, clock and all instruction propagation through ALU were observed with bias margins of +/-11% and +/-9%, respectively. Also at low speed, the ALU exhibited correct functionality for all arithmetical and logical instructions with +/-6% bias margins. We tested the 8-bit ALU for all instructions up to 2.8 GHz clock frequency.

show abstract

“…Various 8-bit SFQ microprocessors have been developed in the last two decades, including a bit-serial microprocessor with eight 1-bit serial ALU blocks (FLUX-1) [6], a bit-serial CORE1 processor [7], and a bit-serial SCRAM2 asynchronous microprocessor [8]. More specifically, the arithmetic logic unit (ALU), a critical part of a microprocessor, has gained significant research importance in RSFQ [9], [10], [11], [12].…”

Section: Introductionmentioning

confidence: 99%

qBSA: Logic Design of a 32-bit Block-Skewed RSFQ Arithmetic Logic Unit

Kundu

Datta

Beerel

et al. 2019

2019 IEEE International Superconductive Electronics Conference (ISEC)

View full text Add to dashboard Cite

Single flux quantum (SFQ) circuits are an attractive beyond-CMOS technology because they promise two orders of magnitude lower power at clock frequencies exceeding 25 GHz. However, every SFQ gate is clocked creating very deep gatelevel pipelines that are difficult to keep full, particularly for sequences that include data-dependent operations. This paper proposes to increase the throughput of SFQ pipelines by redesigning the datapath to accept and operate on least-significant bits (LSBs) clock cycles earlier than more significant bits. This skewed datapath approach reduces the latency of the LSB side which can be feedback earlier for use in subsequent datadependent operations increasing their throughput. In particular, we propose to group the bits into 4-bit blocks that are operated on concurrently and create block-skewed datapath units for 32bit operation. This skewed approach allows a subsequent datadependent operation to start evaluating as soon as the first 4bit block completes. Using this general approach, we develop a block-skewed MIPS-compatible 32-bit ALU. Our gate-level Verilog design improves the throughput of 32-bit data dependent operations by 2x and 1.5x compared to previously proposed 4bit bit-slice and 32-bit Ladner-Fischer ALUs respectively. We have quantified the benefit of this design on instructions per cycle (IPC) for various RISC-V benchmarks assuming a range of non-ALU operation latencies from one to ten cycles. Averaging across benchmarks, our experimental results show that compared to the 32-bit Ladner-Fischer our proposed architecture provides a range of IPC improvements between 1.37x assuming one-cycle non-ALU latency to 1.2x assuming ten-cycle non-ALU latency. Moreover, our average IPC improvements compared to a 32-bit ALU based on the 4-bit bit-slice range from 2.93x to 4x.Index Terms-Energy efficient computation, RSFQ, arithmetic logic unit (ALU), block-skewed architecture.

show abstract

Design and Implementation of a Fully Asynchronous SFQ Microprocessor: SCRAM2

Cited by 30 publications

References 10 publications

Frequency synchronization of single flux quantum oscillators

Frequency synchronization of single flux quantum oscillators

ERSFQ 8-Bit Parallel Arithmetic Logic Unit

qBSA: Logic Design of a 32-bit Block-Skewed RSFQ Arithmetic Logic Unit

Contact Info

Product

Resources

About