Memory Sizing of a Scalable SRAM In-Memory Computing Tile Based Architecture

Gauchi, Roman; Kooli, Maha; Vivet, Pascal; Noël, Jean-Philippe; Beigné, Edith; Mitra, Subhasish; Charles, Henri-Pierre

doi:10.1109/vlsi-soc.2019.8920373

Cited by 21 publications

(13 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In this case, several memory instances with a limited word length (e.g. 32-or 64-bit) can be used to reach the targeted data vector length, while sharing the same digital wrapper to limit area and leakage power penalties [8]. A similar partionning can be done to overcome the limitation of the number of words per SRAM instance, as shown in Fig.…”

Section: Proposed C-sram Design Methodologymentioning

confidence: 99%

Computational SRAM Design Automation using Pushed-Rule Bitcells for Energy-Efficient Vector Processing

Noël

Egloff

Kooli

et al. 2020

2020 Design, Automation &Amp; Test in Europe Conference &Amp; Exhibition (DATE)

Self Cite

View full text Add to dashboard Cite

This paper presents a new methodology for automating the Computational SRAM (C-SRAM) design based on off-the-shelf memory compilers and a configurable RTL IP. The main goal is to drastically reduce the development effort compared to a full-custom design, while offering a flexibility of use and a high-yield production. The proposed C-SRAM architecture has been developed to process energy-efficient vector data coupled with a scalar processor, while limiting the data transfer on the system bus. The results obtained by post P&R simulations show that 2RW and 4RW C-SRAM configurations using the double pumping technique achieved the highest performance to process vectorized MAC operations compared to the others configurations. Moreover, it has been shown that the impact of the digital wrapper decoding and executing the instructions can be mitigated by increasing the memory cut size to represent less than 10% in area and 20% in power consumption.

show abstract

Section: Proposed C-sram Design Methodologymentioning

confidence: 99%

Computational SRAM Design Automation using Pushed-Rule Bitcells for Energy-Efficient Vector Processing

Noël

Egloff

Kooli

et al. 2020

2020 Design, Automation &Amp; Test in Europe Conference &Amp; Exhibition (DATE)

Self Cite

View full text Add to dashboard Cite

show abstract

“…General-purpose code compilers such as gcc and clang can be used to program IMC architectures through macros, like [9] for example. The problem is that this solution shows limited expressiveness and code portability, as such a dedicated solution for IMC would be preferable.…”

Section: B Software Solutions For In-memory Computingmentioning

confidence: 99%

Instruction Set Design Methodology for In-Memory Computing through QEMU-based System Emulator

Mambu¹,

Charles

Dumas³

et al. 2021

2021 IEEE International Workshop on Rapid System Prototyping (RSP)

View full text Add to dashboard Cite

In-Memory Computing (IMC) is a promising paradigm to mitigate the von Neumann bottleneck. However its evaluation on complete applications in the context of full-scale systems is limited by the complexity of simulation frameworks as well is the disjunction between hardware exploration and compiler support. This paper proposes a global exploration flow in the scale of Instruction Set Architectures (ISA) to perform both modeling and the generation of compiler support to perform ISA-level exploration. Our emulation methodology is based on QEMU, implements a performance model based on hardware characterizations from the State-of-the-Art, and allows the modeling of cache hierarchies, while our compiler support is automatically generated and based on a specialized compiler. We evaluate three applications in the domains of image processing and linear algebra on a reference IMC architecture, and analyze the obtained results to validate our methodology.

show abstract

“…By using a memory partitioning into smaller tiles, the Tile Address Mapper allows to scale the architecture with some limits: energy cost of individual accesses is inversely proportional to the read access time. In terms of physical design, the Tile Address Mapper provides an energy/performance trade-off as long as the number of tiles is limited [9].…”

Section: Inter-tiles Reconfiguration and Vertical Communicationmentioning

confidence: 99%

“…Regarding the physical scalability of the multi-tile memory architecture, we have evaluated the wiring cost and the correct trade-off between C-SRAM tile size and tile performance. As presented in [9], a wiring cost and energy model based on Place&Route shows the scalability of multiple SRAM tiles: for a 256 kB of total memory size, composing an 4×16 array of 4 kB tile, the wiring cost between tiles is about 50% in read access time, while partitioning in array allows to save massive dynamic power. The C-SRAM architecture 67.20 * * SRAM memory access in one instruction.…”

Section: Simulation Platform Calibrationmentioning

confidence: 99%

Reconfigurable tiles of computing-in-memory SRAM architecture for scalable vectorization

Gauchi

Egloff

Kooli

et al. 2020

Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design

Self Cite

View full text Add to dashboard Cite

For big data applications, bringing computation to the memory is expected to reduce drastically data transfers, which can be done using recent concepts of Computing-In-Memory (CIM). To address kernels with larger memory data sets, we propose a reconfigurable tilebased architecture composed of Computational-SRAM (C-SRAM) tiles, each enabling arithmetic and logic operations within the memory. The proposed horizontal scalability and vertical data communication are combined to select the optimal vector width for maximum performance. These schemes allow to use vector-based kernels available on existing SIMD engines onto the targeted CIM architecture. For architecture exploration, we propose an instruction-accurate simulation platform using SystemC/TLM to quantify performance and energy of various kernels. For detailed performance evaluation, the platform is calibrated with data extracted from the Place&Route C-SRAM circuit, designed in 22nm FDSOI technology. Compared to 512-bit SIMD architecture, the proposed CIM architecture achieves an EDP reduction up to 60× and 34× for memory bound kernels and for compute bound kernels, respectively.

show abstract

Memory Sizing of a Scalable SRAM In-Memory Computing Tile Based Architecture

Cited by 21 publications

References 14 publications

Computational SRAM Design Automation using Pushed-Rule Bitcells for Energy-Efficient Vector Processing

Computational SRAM Design Automation using Pushed-Rule Bitcells for Energy-Efficient Vector Processing

Instruction Set Design Methodology for In-Memory Computing through QEMU-based System Emulator

Reconfigurable tiles of computing-in-memory SRAM architecture for scalable vectorization

Contact Info

Product

Resources

About