HERMES Core – A 14nm CMOS and PCM-based In-Memory Compute Core using an array of 300ps/LSB Linearized CCO-based ADCs and local digital processing

Khaddam-Aljameh, Riduan; Stanisavljević, Miloš; Mas, Jordi Fornt; Karunaratne, Geethan; Braendli, Matthias; Liu, F.; Singh, Abhairaj; Müller, Silvia M.; Egger, Urs; Πετρόπουλος, Αναστάσιος; Antonakopoulos, Theodore; Brew, Kevin; Choi, S.; Ok, I.; Lie, Fee Li; Saulnier, Nicole; Chan, V.; Ahsan, Ishtiaq; Narayanan, V.; Nandakumar, S. R.; Gallo, Manuel Le; Francese, Pier Andrea; Sebastian, Abu; Eleftheriou, Evangelos

doi:10.23919/vlsicircuits52068.2021.9492362

Cited by 82 publications

(43 citation statements)

References 2 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…One important feature is the nonlinearity that can be observed in figure 12 when compared with the linearity of the currents. This was also shown previously in [34] and is due to the different transfer functions of 1T1R crossbar and ADC [44,45]. While the non-overlapping currents from figure 9(a) can also be distinguished at the stage of the ADC in the number of generated pulses, the overlap in the measured currents in figure 9(b) also translates into an overlap in the number of generated pulses in figure 12(b).…”

Section: Effect Of Lrs and Hrs Variability On The Vmmsupporting

confidence: 81%

Reliability aspects of binary vector-matrix-multiplications using ReRAM devices

Bengel

Mohr

Wiefels

et al. 2022

Neuromorph. Comput. Eng.

View full text Add to dashboard Cite

Computation-In-Memory (CIM) using memristive devices is a promising approach to overcome the performance limitations of conventional computing architectures introduced by the von Neumann bottleneck which are also known as memory wall and power wall. It has been shown that accelerators based on memristive devices can deliver higher energy efficiencies and data throughputs when compared with conventional architectures. In the vast multitude of memristive devices, bipolar resistive switches (BRS) based on the valence change mechanism (VCM) are particularly interesting due to their low power operation, non-volatility, high integration density and their CMOS compatibility. While a wide range of possible applications is considered, many of them such as artificial neural networks heavily rely on Vector-Matrix-Multiplications (VMMs) as a mathematical operation. These VMMs are made up of large numbers of Multiplication and Accumulation (MAC) operations. The MAC operation can be realised using memristive devices in an analog fashion using Ohm’s law and Kirchhoff’s law. However, VCM devices exhibit a range of non-idealities, affecting the VMM performance, which in turn impacts the overall accuracy of the application. Those non-idealities can be classified into time-independent (programming variability) and timedependent (read disturb and read noise). Additionally, peripheral circuits such as Analog to Digital Converters (ADCs) can introduce errors during the digitalization. In this work, we experimentally and theoretically investigate the impact of deviceand circuit-level effects on the VMM in a VCM crossbars. Our analysis shows that the variability of the Low Resistive State (LRS) plays a key role and that reading in the RESET direction should be favored to reading in the SET direction.

show abstract

Section: Effect Of Lrs and Hrs Variability On The Vmmsupporting

confidence: 81%

Reliability aspects of binary vector-matrix-multiplications using ReRAM devices

Bengel

Mohr

Wiefels

et al. 2022

Neuromorph. Comput. Eng.

View full text Add to dashboard Cite

show abstract

“…One drawback with this approach is that only a single bit can be stored in an SRAM cell. An alternative is to adopt AIMC based on non-volatile memory technologies, including 2D [27] and 3D Flash [28], phasechange memory (PCM) [13], and resistive random-access memory (RRAM) [12]. These technologies offer analog data storage capability, i.e.…”

Section: Background On Analog In-memory Accelerationmentioning

confidence: 99%

“…Table I-(C) reports the performance and energy metrics of the AIMC tile estimated from hardware measurements and chip designs in 14 nm technology node [13], [36]. For compatibility with the core and cache model in 28 nm node, we upscale the AIMC tile power estimates with a scaling factor of 5.3x for the high-power system and 2x for the low-power system.…”

Section: B Aimc Setup and Modelingmentioning

confidence: 99%

ALPINE: Analog In-Memory Acceleration with Tight Processor Integration for Deep Learning

Klein¹,

Boybat²,

Qureshi³

et al. 2022

Preprint

View full text Add to dashboard Cite

Analog in-memory computing (AIMC) cores offers significant performance and energy benefits for neural network inference with respect to digital logic (e.g., CPUs). AIMCs accelerate matrix-vector multiplications, which dominate these applications' run-time. However, AIMC-centric platforms lack the flexibility of general-purpose systems, as they often have hardcoded data flows and can only support a limited set of processing functions. With the goal of bridging this gap in flexibility, we present a novel system architecture that tightly integrates analog in-memory computing accelerators into multi-core CPUs in general-purpose systems. We developed a powerful gem5-based full system-level simulation framework into the gem5-X simulator, ALPINE, which enables an in-depth characterization of the proposed architecture. ALPINE allows the simulation of the entire computer architecture stack from major hardware components to their interactions with the Linux OS. Within ALPINE, we have defined a custom ISA extension and a software library to facilitate the deployment of inference models. We showcase and analyze a variety of mappings of different neural network types, and demonstrate up to 20.5x/20.8x performance/energy gains with respect to a SIMD-enabled ARM CPU implementation for convolutional neural networks, multi-layer perceptrons, and recurrent neural networks.

show abstract

“…Compute-in-memory (CIM) based on resistive random-access memory (RRAM) 1 promises to meet such demand by storing AI model weights in dense, analogue and non-volatile RRAM devices, and by performing AI computation directly within RRAM, thus eliminating power-hungry data movement between separate compute and memory [2][3][4][5] . Although recent studies have demonstrated in-memory matrix-vector multiplication on fully integrated RRAM-CIM hardware [6][7][8][9][10][11][12][13][14][15][16][17] , it remains a goal for a RRAM-CIM chip to simultaneously deliver high energy efficiency, versatility to support diverse models and software-comparable accuracy. Although efficiency, versatility and accuracy are all indispensable for broad adoption of the technology, the inter-related trade-offs among them cannot be addressed by isolated improvements on any single abstraction level of the design.…”

mentioning

confidence: 99%

“…More recent studies have demonstrated fully integrated RRAM complementary metal-oxide-semiconductor (CMOS) chips capable of performing in-memory matrix-vector multiplication (MVM) [6][7][8][9][10][11][12][13][14][15][16][17] . However, for a RRAM-CIM chip to be broadly adopted in practical AI applications, it needs to simultaneously deliver high energy efficiency, the flexibility to support diverse AI model architectures and software-comparable inference accuracy.…”

mentioning

confidence: 99%

A compute-in-memory chip based on resistive random-access memory

Wan

Kubendran

Schaefer

et al. 2022

Nature

399

181

View full text Add to dashboard Cite

Realizing increasingly complex artificial intelligence (AI) functionalities directly on edge devices calls for unprecedented energy efficiency of edge hardware. Compute-in-memory (CIM) based on resistive random-access memory (RRAM)1 promises to meet such demand by storing AI model weights in dense, analogue and non-volatile RRAM devices, and by performing AI computation directly within RRAM, thus eliminating power-hungry data movement between separate compute and memory2–5. Although recent studies have demonstrated in-memory matrix-vector multiplication on fully integrated RRAM-CIM hardware6–17, it remains a goal for a RRAM-CIM chip to simultaneously deliver high energy efficiency, versatility to support diverse models and software-comparable accuracy. Although efficiency, versatility and accuracy are all indispensable for broad adoption of the technology, the inter-related trade-offs among them cannot be addressed by isolated improvements on any single abstraction level of the design. Here, by co-optimizing across all hierarchies of the design from algorithms and architecture to circuits and devices, we present NeuRRAM—a RRAM-based CIM chip that simultaneously delivers versatility in reconfiguring CIM cores for diverse model architectures, energy efficiency that is two-times better than previous state-of-the-art RRAM-CIM chips across various computational bit-precisions, and inference accuracy comparable to software models quantized to four-bit weights across various AI tasks, including accuracy of 99.0 percent on MNIST18 and 85.7 percent on CIFAR-1019 image classification, 84.7-percent accuracy on Google speech command recognition20, and a 70-percent reduction in image-reconstruction error on a Bayesian image-recovery task.

show abstract

HERMES Core – A 14nm CMOS and PCM-based In-Memory Compute Core using an array of 300ps/LSB Linearized CCO-based ADCs and local digital processing

Cited by 82 publications

References 2 publications

Reliability aspects of binary vector-matrix-multiplications using ReRAM devices

Reliability aspects of binary vector-matrix-multiplications using ReRAM devices

ALPINE: Analog In-Memory Acceleration with Tight Processor Integration for Deep Learning

A compute-in-memory chip based on resistive random-access memory

Contact Info

Product

Resources

About