An 8 Bit 12.4 TOPS/W Phase-Domain MAC Circuit for Energy-Constrained Deep Learning Accelerators

We present a 256 × 256 in-memory compute (IMC) core designed and fabricated in 14-nm CMOS technology with backend-integrated multi-level phase change memory (PCM). It comprises 256 linearized current-controlled oscillator (CCO)-based A/D converters (ADCs) at a compact 4-µm pitch and a local digital processing unit (LDPU) performing affine scaling and ReLU operations. A frequency-linearization technique for CCO is introduced, which increases the maximum Manuscript

show abstract

“…to encode multi-bit data in a single physical quantity, such as time [7], [8], electrical current [9]- [12], charge [13]- [16],…”

Section: Ieee Journal Of Solid-state Circuitsmentioning

confidence: 99%

HERMES-Core—A 1.59-TOPS/mm² PCM on 14-nm CMOS In-Memory Compute Core Using 300-ps/LSB Linearized CCO-Based ADCs

Khaddam-Aljameh

Stanisavljević

Mas

et al. 2022

IEEE J. Solid-State Circuits

View full text Add to dashboard Cite

show abstract

“…In the figure, we also project average macro-level energy efficiency in TOPs/W. For digital processing, we use 2.8 TOPs/W from [34]. For multiplication-free compute-in-memory processing, we use 105 TOPs/W from Table II.…”

Section: Synergistic Integration Of Digital and Compute-in-memory Pro...mentioning

confidence: 99%

MF-Net: Compute-In-Memory SRAM for Multibit Precision Inference using Memory-immersed Data Conversion and Multiplication-free Operators

Nasrin¹,

Badawi²,

Çetin³

et al. 2021

Preprint

View full text Add to dashboard Cite

We propose a co-design approach for compute-inmemory inference for deep neural networks (DNN). We use multiplication-free function approximators based on 1 norm along with a co-adapted processing array and compute flow. Using the approach, we overcame many deficiencies in the current art of in-SRAM DNN processing such as the need for digital-toanalog converters (DACs) at each operating SRAM row/column, the need for high precision analog-to-digital converters (ADCs), limited support for multi-bit precision weights, and limited vector-scale parallelism. Our co-adapted implementation seamlessly extends to multi-bit precision weights, it doesn't require DACs, and it easily extends to higher vector-scale parallelism. We also propose an SRAM-immersed successive approximation ADC (SA-ADC), where we exploit the parasitic capacitance of bit lines of SRAM array as a capacitive DAC. Since the dominant area overhead in SA-ADC comes due to its capacitive DAC, by exploiting the intrinsic parasitic of SRAM array, our approach allows low area implementation of within-SRAM SA-ADC. Our 8×62 SRAM macro, which requires a 5-bit ADC, achieves ∼105 tera operations per second per Watt (TOPS/W) with 8-bit input/weight processing at 45 nm CMOS. Our 8×30 SRAM macro, which requires a 4-bit ADC, achieves ∼84 TOPS/W. SRAM macros that require lower ADC precision are more tolerant of process variability, however, have lower TOPS/W as well. We evaluated the accuracy and performance of our proposed network for MNIST, CIFAR10, and CIFAR100 datasets. We chose a network configuration which adaptively mixes multiplication-free and regular operators. The network configurations utilize the multiplication-free operator for more than 85% operations from the total. The selected configurations are 98.6% accurate for MNIST, 90.2% for CIFAR10, and 66.9% for CIFAR100. Since most of the operations in the considered configurations are based on proposed SRAM macros, our compute-in-memory's efficiency benefits broadly translate to the system-level.

show abstract

“…1(b), but additionally including a phase-encoded multiplier and accumulator circuit (PMAC) in the first layer of the neural network, making use of ring-oscillators. It has been already proven that using mixed-signal circuits to perform MAC operations leads to a high power efficiency [5], [6]. However, its application to process frequency-encoded signals coming from a VCO-based ADC has not been proposed yet.…”

Section: Input Signalmentioning

confidence: 99%

“…The accuracy required in the MAC operation is leveraged by the structure of the neural network, which enables the calculation with approximated analog methods. Structures using a ring-oscillator to perform MAC operations have been reported in [6], but with a digital code as the input rather than signals coming from a VCObased ADC.…”

Section: B Operations In the Neuronmentioning

confidence: 99%

Low Power Phase-Encoded MAC Accelerator for Smart Sensors with VCO-based ADCs

Gutierrez

Pérez

Patón

et al. 2020

2020 IEEE 63rd International Midwest Symposium on Circuits and Systems (MWSCAS)

View full text Add to dashboard Cite

A new phase-encoded MAC cell is proposed for low power smart sensing applications. If digitization of the raw data is made through voltage-controlled-oscillators based analog-to-digital converters (VCO-based ADCs), we may take the unsampled frequency-encoded output signal and connect it to the first layer of a neural network. Then that layer could be implemented with phase-encoded MAC accelerators, leading to an energy-efficient solution. The MAC cell does not only make the accumulation/subtraction and multiplication operation, but also the non-linear function which supposes a great advantage with respect to other equivalent cells. A circuit example is proposed in a 65-nm CMOS process and transient simulations prove the feasibility of the approach.

show abstract

An 8 Bit 12.4 TOPS/W Phase-Domain MAC Circuit for Energy-Constrained Deep Learning Accelerators

Cited by 25 publications

References 21 publications

HERMES-Core—A 1.59-TOPS/mm² PCM on 14-nm CMOS In-Memory Compute Core Using 300-ps/LSB Linearized CCO-Based ADCs

HERMES-Core—A 1.59-TOPS/mm² PCM on 14-nm CMOS In-Memory Compute Core Using 300-ps/LSB Linearized CCO-Based ADCs

MF-Net: Compute-In-Memory SRAM for Multibit Precision Inference using Memory-immersed Data Conversion and Multiplication-free Operators

Low Power Phase-Encoded MAC Accelerator for Smart Sensors with VCO-based ADCs

Contact Info

Product

Resources

About

An 8 Bit 12.4 TOPS/W Phase-Domain MAC Circuit for Energy-Constrained Deep Learning Accelerators

Cited by 25 publications

References 21 publications

HERMES-Core—A 1.59-TOPS/mm2 PCM on 14-nm CMOS In-Memory Compute Core Using 300-ps/LSB Linearized CCO-Based ADCs

HERMES-Core—A 1.59-TOPS/mm2 PCM on 14-nm CMOS In-Memory Compute Core Using 300-ps/LSB Linearized CCO-Based ADCs

MF-Net: Compute-In-Memory SRAM for Multibit Precision Inference using Memory-immersed Data Conversion and Multiplication-free Operators

Low Power Phase-Encoded MAC Accelerator for Smart Sensors with VCO-based ADCs

Contact Info

Product

Resources

About

HERMES-Core—A 1.59-TOPS/mm² PCM on 14-nm CMOS In-Memory Compute Core Using 300-ps/LSB Linearized CCO-Based ADCs

HERMES-Core—A 1.59-TOPS/mm² PCM on 14-nm CMOS In-Memory Compute Core Using 300-ps/LSB Linearized CCO-Based ADCs