A Review of In-Memory Computing Architectures for Machine Learning Applications

Bavikadi, Sathwika; Sutradhar, Purab Ranjan; Khasawneh, Khaled N.; Ganguly, Amlan; Dinakarrao, Sai Manoj Pudukotai

doi:10.1145/3386263.3407649

Cited by 48 publications

(10 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…‘Create ML’ uses transfer learning [ 57 ], capable of applying an existing model (trained with a dataset relevant to one problem) to a completely new problem. The macOS operating system already has extensive machine learning models [ 57 , 62 ] that were created by Apple. ‘Create ML’ uses their patterns to perform a new training using previously extracted features.…”

Section: Methodsmentioning

confidence: 99%

Comparison of the Usability of Apple M1 Processors for Various Machine Learning Tasks

Kasperek

Podpora

Kawala-Sterniuk

2022

Sensors

View full text Add to dashboard Cite

In this paper, the authors have compared all of the currently available Apple MacBook Pro laptops, in terms of their usability for basic machine learning research applications (text-based, vision-based, tabular). The paper presents four tests/benchmarks, comparing four Apple Macbook Pro laptop versions: Intel based (i5) and three Apple based (M1, M1 Pro and M1 Max). A script in the Swift programming language was prepared, whose goal was to conduct the training and evaluation process for four machine learning (ML) models. It used the Create ML framework—Apple’s solution dedicated to ML model creation on macOS devices. The training and evaluation processes were performed three times. While running, the script performed measurements of their performance, including the time results. The results were compared with each other in tables, which allowed to compare and discuss the performance of individual devices and the benefits of the specificity of their hardware architectures.

show abstract

Section: Methodsmentioning

confidence: 99%

Comparison of the Usability of Apple M1 Processors for Various Machine Learning Tasks

Kasperek

Podpora

Kawala-Sterniuk

2022

Sensors

View full text Add to dashboard Cite

show abstract

“…Mandal et al [45] introduce a custom network-on-chip and scheduling method, which reduces the communication latency by 20%-80%. More detailed surveys of the application of IMC to deep learning can be found in [49] and [4]. The successful application of NMC and IMC in deep learning suggests that it will also be useful for deep reinforcement learning in the future.…”

Section: Near-and In-memory Computingmentioning

confidence: 99%

A Survey of Domain-Specific Architectures for Reinforcement Learning

Rothmann

Porrmann

2022

IEEE Access

View full text Add to dashboard Cite

Reinforcement learning algorithms have been very successful at solving sequential decisionmaking problems in many different problem domains. However, their training is often time-consuming, with training times ranging from multiple hours to weeks. The development of domain-specific architectures for reinforcement learning promises faster computation times, decreased experiment turn-around time, and improved energy efficiency. This paper presents a review of hardware architectures for the acceleration of reinforcement learning algorithms. FPGA-based implementations are the focus of this work, but GPU-based approaches are considered as well. Both tabular and deep reinforcement learning algorithms are included in this survey. The techniques employed in different implementations are highlighted and compared. Finally, possible areas for future work are suggested, based on the preceding discussion of existing architectures.

show abstract

“…With an in-memory computation system, the bottleneck and extra power barrier to achieving high bandwidth data transfer between the external memory chip and the processor are significantly minimized using the non-von Neumann architecture [9]. The application of non-volatile memory device technologies such as resistive-switching random access memory (RRAM), phase-change memory (PCM), magnetic random-access memory (MRAM), and ferroelectric random-access memory (FeRAM) are studied for in-memory applications [10]. Here, we intend to study further the application of lower power oxygen vacancybased RRAM for in-memory circuits used for edge-based training to build AI models for the autonomous system.…”

Section: Introductionmentioning

confidence: 99%

Neuromorphic In-Memory RRAM NAND/NOR Circuit Performance Analysis in a CNN Training Framework on the Edge for Low Power IoT

Prabhu

Raghavan

2022

IEEE Access

View full text Add to dashboard Cite

Training a CNN involves computationally intense optimization algorithms to fit the network using a training dataset, to update the network weight for inferencing and then pattern classification. Hence, the application of in-memory computation would enable a highly power-efficient low latency on-the-edge CNN training technique by avoiding the memorywall created during the external memory read/write operation (for off chip instruction and data transfer). A memory writeverify, and re-program technique can control the RRAM variability. Still, memory verification and re-program is a complex process with additional resources needed for practical implementation of verification circuit. In this study, we have demonstrated a practical (First-in Max-Out) FIMO-based cache memory called Maximum Count Binary Comparator Layer (MCBC), using 1T3R, 1T5R, and 1T7R RRAM structures by using a probability-based accuracy improvement architecture, without the conventional verification process. We constructed 10 layered modified MobileNET with filter size ranging from 32 -512 and trained with Traffic Sign Recognition Database (TSRD) using a three-tier abstraction simulation learning framework -(1) High level, 10 layered CNN implementation with Python+TensorFlow; (2) Verilog HDL based FP32MUL and FP32ADD (32-bits Floating Point adder and multiplier) circuits constructed with RRAM NAND gates using 1T2R structures; and (3) Digital Look-Up-Table (LUT) model for RRAM variability. An edge learning framework (for the forward pass) is demonstrated using digital RRAM-NAND/NOR universal gates integrated with the Maximum Count Binary Comparator Layer (MCBC) to partially circumvent the impact of RRAM variability and to quantify the RRAM variability on the CNN training prediction accuracy for 65nm CMOS OxRAM (TiN/HfO2/Hf/TiN) with varying device current compliance of 5, 10, and 50µA for low power IoT applications. The MCBC layer was simulated using a SPICE model, for which the estimated chip layout is 1150 × 1230 nm 2 per logical gate input, which resulted in an overall prediction accuracy improvement from 10% to 60% by repeating the logical operations of the NOR gate for {1, 3, 5, and 7} cycles respectively.

show abstract

A Review of In-Memory Computing Architectures for Machine Learning Applications

Cited by 48 publications

References 15 publications

Comparison of the Usability of Apple M1 Processors for Various Machine Learning Tasks

Comparison of the Usability of Apple M1 Processors for Various Machine Learning Tasks

A Survey of Domain-Specific Architectures for Reinforcement Learning

Neuromorphic In-Memory RRAM NAND/NOR Circuit Performance Analysis in a CNN Training Framework on the Edge for Low Power IoT

Contact Info

Product

Resources

About