Sparse Matrix Multiplication On An Associative Processor

Yavits, Leonid; Morad, Amir; Ginosar, Ran

doi:10.1109/tpds.2014.2370055

Cited by 29 publications

(9 citation statements)

References 38 publications

(76 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The scaling of CMOS associative processors [16], [17] is limited due to the CMOS CAM density. However, an MRAM AP cell is at least an order-of-magnitude smaller, thus paving the way for associative in-memory computing at scale.…”

Section: B Associative Processormentioning

confidence: 99%

“…Since modern datasets become increasingly sparse, the ability of computers to properly process sparse data (for example, not wasting time and energy on fetching and multiplying zero-data elements) becomes a critical requirement. An AP holds a significant intrinsic advantage in sparse data processing: sparse data in one of the compressed formats (such as compressed sparse row or compressed sparse column) can be processed almost as efficiently as dense data [17].…”

Section: Application Spacementioning

confidence: 99%

See 1 more Smart Citation

AM⁴: MRAM Crossbar Based CAM/TCAM/ACAM/AP for In-Memory Computing

Garzón

Lanuzza

Teman

et al. 2023

IEEE J. Emerg. Sel. Topics Circuits Syst.

View full text Add to dashboard Cite

In-memory computing seeks to minimize data movement and alleviate the memory wall by computing in-situ, in the same place that the data is located. One of the key emerging technologies that promises to enable such computing-inmemory is spin-transfer torque magnetic tunnel junction (STT-MTJ). This paper proposes AM 4 , a combined STT-MTJ-based Content Addressable Memory (CAM), Ternary CAM (TCAM), approximate matching (similarity search) CAM (ACAM), and in-memory Associative Processor (AP) design, inspired by the recently announced Samsung MRAM crossbar. We demonstrate and evaluate the performance and energy-efficiency of the AM 4based AP using a variety of data intensive workloads. We show that an AM 4 -based AP outperforms state-of-the-art solutions both in performance (with the average speedup of about 10 ×) and energy-efficiency (by about 60 × on average).

show abstract

Section: B Associative Processormentioning

confidence: 99%

Section: Application Spacementioning

confidence: 99%

AM⁴: MRAM Crossbar Based CAM/TCAM/ACAM/AP for In-Memory Computing

Garzón

Lanuzza

Teman

et al. 2023

IEEE J. Emerg. Sel. Topics Circuits Syst.

View full text Add to dashboard Cite

show abstract

“…APs have been explored for many applications such as matrix multiplication [11], [12], fast Fourier transform (FFT) [13], discrete Fourier transform (DCT) and video application [43], DNA sequence alignment [14], stencil applications [44], convolution operation [9], solution of path problems [45], optimum branchings [46], databases applications [47] and computer vision [48]. The old applications need to be revisited and re-evaluate under the new AP design approaches and technologies besides exploring new applications that could benefit from the AP.…”

Section: B Promising Applicationsmentioning

confidence: 99%

In-memory Associative Processors: Tutorial, Potential, and Challenges

Fouda¹,

Yantır²,

Eltawil³

et al. 2022

Preprint

View full text Add to dashboard Cite

In-memory computing is an emerging computing paradigm that overcomes the limitations of exiting Von-Neumann computing architectures such as the memory-wall bottleneck. In such paradigm, the computations are performed directly on the data stored in the memory, which eliminates the need for memory-processor communications. Hence, orders of magnitude speedup could be achieved especially with data-intensive applications. Associative processors (APs) were proposed since the seventies and recently were revived thanks to the high-density memories. In this tutorial brief, We overview the functionalities and recent trends of APs in addition to the implementation of each Content-addressable memory with different technologies. The AP operations and runtime complexity are also summarized. We also explain and explore the possible applications that can benefit from APs. Finally, the AP limitations, challenges and future directions are discussed.

show abstract

“…For this reason, their best usage is in applications that have an inherent SIMD (single-instruction multiple-data) computational pattern. Fast Fourier transform (FFT) [15], DNA sequence alignment [16], stencil [17], and matrix multiplication [18], [19] are some example applications that can benefit from AP. In an analogy, APs can be considered as a next step on the path of the CPU (central processing unit) to GPU (graphical processing unit) transformation.…”

Section: Content Addressable Memorymentioning

confidence: 99%

IMCA: An Efficient In-Memory Convolution Accelerator

Yantır

Eltawil

Salama

2021

IEEE Trans. VLSI Syst.

View full text Add to dashboard Cite

Traditional convolutional neural network (CNN) architectures suffer from two bottlenecks; computational complexity and memory access cost. In this study, an efficient in-memory convolution accelerator (IMCA) is proposed based on associative in-memory processing to alleviate these two problems directly. In the IMCA, the convolution operations are directly performed inside the memory as in-place operations. The proposed memory computational structure allows for a significant improvement in computational metrics, namely, TOPS/W. Furthermore, due to its unconventional computation style, the IMCA can take advantage of many potential opportunities such as constant multiplication, bit-level sparsity, and dynamic approximate computing, which, while supported by traditional architectures, require extra overhead to exploit, thus reducing any potential gains. The proposed accelerator architecture exhibits a significant efficiency in terms of area and performance, achieving around 0.65 GOPS and 1.64 TOPS/W at 16-bit fixed-point precision with an area less than 0.25 mm 2 .

show abstract

Sparse Matrix Multiplication On An Associative Processor

Cited by 29 publications

References 38 publications

AM⁴: MRAM Crossbar Based CAM/TCAM/ACAM/AP for In-Memory Computing

AM⁴: MRAM Crossbar Based CAM/TCAM/ACAM/AP for In-Memory Computing

In-memory Associative Processors: Tutorial, Potential, and Challenges

IMCA: An Efficient In-Memory Convolution Accelerator

Contact Info

Product

Resources

About

Sparse Matrix Multiplication On An Associative Processor

Cited by 29 publications

References 38 publications

AM4: MRAM Crossbar Based CAM/TCAM/ACAM/AP for In-Memory Computing

AM4: MRAM Crossbar Based CAM/TCAM/ACAM/AP for In-Memory Computing

In-memory Associative Processors: Tutorial, Potential, and Challenges

IMCA: An Efficient In-Memory Convolution Accelerator

Contact Info

Product

Resources

About

AM⁴: MRAM Crossbar Based CAM/TCAM/ACAM/AP for In-Memory Computing

AM⁴: MRAM Crossbar Based CAM/TCAM/ACAM/AP for In-Memory Computing