Efficient Algorithms for In-Memory Fixed Point Multiplication Using MAGIC

Haj-Ali, Ameer; Ben-Hur, Rotem; Wald, Nimrod; Kvatinsky, Shahar

doi:10.1109/iscas.2018.8351561

Cited by 60 publications

(27 citation statements)

References 31 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…• We extend a previous work [30] and propose four algorithms for efficient execution of FiP multiplication using MAGIC gates. These algorithms enable FiP multiplication to be performed within acceptably sized memristive memory arrays.…”

mentioning

confidence: 84%

IMAGING: In-Memory AlGorithms for Image processiNG

Haj-Ali

Ben-Hur

Wald

et al. 2018

IEEE Trans. Circuits Syst. I

Self Cite

View full text Add to dashboard Cite

Data-intensive applications such as image processing suffer from massive data movement between memory and processing units. The severe limitations on system performance and energy efficiency imposed by this data movement are further exacerbated with any increase in the distance the data must travel. This data transfer and its associated obstacles could be eliminated by the use of emerging non-volatile resistive memory technologies (memristors) that make it possible to both store and process data within the same memory cells. In this paper, we propose four in-memory algorithms for efficient execution of fixed point multiplication using MAGIC gates. These algorithms achieve much better latency and throughput than a previous work and significantly reduce the area cost. They can thus be feasibly implemented inside the size-limited memory arrays. We use these fixed point multiplication algorithms to efficiently perform more complex in-memory operations such as image convolution and further show how to partition large images to multiple memory arrays so as to maximize the parallelism. All the proposed algorithms are evaluated and verified using a cycle-accurate and functional simulator. Our algorithms provide on average 200× better performance over state-of-the-art APIM, a processing inmemory architecture for data intensive applications.

show abstract

mentioning

confidence: 84%

IMAGING: In-Memory AlGorithms for Image processiNG

Haj-Ali

Ben-Hur

Wald

et al. 2018

IEEE Trans. Circuits Syst. I

Self Cite

View full text Add to dashboard Cite

show abstract

“…For example, we have developed algorithms 6 that efficiently execute fixed-point (FiP) multiplication using MAGIC NOR gates and allow many such multiplications to be performed inside each MAT simultaneously, enabling the support of image-processing tasks. 7 For automatically generated algorithms, we use a framework 8 that allows optimal implementation of arbitrary logical functions within the memory by defining an optimal sequence of MAGIC NOR operations.…”

Section: Memristive Memory Processing Unit (Mmpu)mentioning

confidence: 99%

“…We extended the MAGIC NOR-based FiP multiplication algorithm 6 to implement image convolution (and the Hadamard product) 7 in the mMPU by aligning the filters (or matrices) in the same WLs/BLs with the images and performing multiply-accumulate (or multiply) operations. We use similar algorithms in Pinatubo by replacing the MAGIC NOR gates with the optimal combination of XOR, AND, and OR gates; the multiplication is done using the sum of the partial products algorithm.…”

Section: Exploring the Potential Of The Mmpumentioning

confidence: 99%

Not in Name Alone: A Memristive Memory Processing Unit for Real In-Memory Processing

et al. 2018

Self Cite

View full text Add to dashboard Cite

Data movement between processing and memory is the root cause of the limited performance and energy efficiency in modern von Neumann systems. To overcome the data-movement bottleneck, we present the memristive Memory Processing Unit (mMPU)-a real processing-in-memory system in which the computation is done directly in the memory cells, thus eliminating the necessity for data transfer. Furthermore, with its enormous inner parallelism, this system is ideal for data-intensive applications that are based on single instruction, multiple data (SIMD)-providing high throughput and energy-efficiency. Modern computers are typically based on the von Neumann architecture, in which the memory is separated from the processing space and programs are executed by moving data between the processing and memory units. This incessant data movement is the lead cause of the performance bottleneck known as the memory wall, which has increased in severity over the years as CPU speed improvements have surpassed those of memory speed and bandwidth. Furthermore, with the demise of Dennard scaling, energy-efficiency is becoming a major concern in modern computers; for example, moving data to an off-chip DRAM consumes four orders of magnitude more energy than the computation itself. 1 One approach to addressing the challenges arising from data movement is to move the computation closer to the memory. Both DRAM and emerging non-volatile memory technologies can provide ample intrinsic parallelism, which goes unutilized today due to pin-limited integrated circuit interfaces. Processing in memory (PIM) can leverage this intrinsic parallelism by avoiding the need for high-latency and high-energy chip-to-chip data transfers, thus yielding massively parallel, high-performance, energy-efficient processing systems.

show abstract

“…Several such sequences were proposed for some popular arithmetic operations and shown to be relatively efficient. The studied functions include fixed-point addition and multiplication [17], [20], [21], and convolution [22]. However, all of these works relied on manual crafting and optimization of the sequence of operations, designed for a specific logical function.…”

Section: Introductionmentioning

confidence: 99%

SIMPLER MAGIC: Synthesis and Mapping of In-Memory Logic Executed in a Single Row to Improve Throughput

Ben-Hur

Ronen

Haj-Ali

et al. 2020

IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.

Self Cite

View full text Add to dashboard Cite

In-memory processing can dramatically improve the latency and energy consumption of computing systems by minimizing the data transfer between the memory and the processor. Efficient execution of processing operations within the memory is therefore a highly motivated objective in modern computer architecture. This paper presents a novel automatic framework for efficient implementation of arbitrary combinational logic functions within a memristive memory. Using tools from logic design, graph theory and compiler register allocation technology, we developed SIMPLER (Synthesis and In-memory MaPping of Logic Execution in a single Row), a tool that optimizes the execution of in-memory logic operations in terms of throughput and area. Given a logical function, SIMPLER automatically generates a sequence of atomic Memristor-Aided loGIC (MAGIC) NOR operations and efficiently locates them within a single size-limited memory row, reusing cells to save area when needed. This approach fully exploits the parallelism offered by the MAGIC NOR gates. It allows multiple instances of the logic function to be performed concurrently, each compressed into a single row of the memory. This virtue makes SIMPLER an attractive candidate for designing in-memory Single Instruction, Multiple Data (SIMD) operations. Compared to previous work (that optimizes latency rather than throughput for a single function), SIMPLER achieves an average throughput improvement of 435×. When previous tools are parallelized similarly to SIMPLER, SIMPLER achieves higher throughput of at least 5×, with 23× improvement in area and 20× improvement in area efficiency. These improvements more than fully compensate for the increase (up to 17% on average) in latency.

show abstract

Efficient Algorithms for In-Memory Fixed Point Multiplication Using MAGIC

Cited by 60 publications

References 31 publications

IMAGING: In-Memory AlGorithms for Image processiNG

IMAGING: In-Memory AlGorithms for Image processiNG

Not in Name Alone: A Memristive Memory Processing Unit for Real In-Memory Processing

SIMPLER MAGIC: Synthesis and Mapping of In-Memory Logic Executed in a Single Row to Improve Throughput

Contact Info

Product

Resources

About