pPIM: A Programmable Processor-in-Memory Architecture With Precision-Scaling for Deep Learning

Sutradhar, Purab Ranjan; Connolly, Mark; Bavikadi, Sathwika; Dinakarrao, Sai Manoj Pudukotai; Indovina, Mark; Ganguly, Amlan

doi:10.1109/lca.2020.3011643

Cited by 29 publications

(9 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Lookup table is a widely used method to improve runtime by replacing computation with memory lookup. There are many works (Deng et al, 2019;Sutradhar et al, 2020;Ferreira et al, 2021) try to accelerate deep neural networks with lookup tables by memorizing vector multiplication results. However, due to the huge lookup table size (GB+) required for memorizing all possible results of a vector-vector multiplication, all of them are DRAM based in-memory accelerators, hence they are not software solutions.…”

Section: Lookup Table Based Vector Multiplication Accelerationmentioning

confidence: 99%

Bit-serial Weight Pools: Compression and Arbitrary Precision Execution of Neural Networks on Resource Constrained Processors

Li¹,

Gupta²

2022

Preprint

View full text Add to dashboard Cite

Applications of neural networks on edge systems have proliferated in recent years but the ever increasing model size makes neural networks not able to deploy on resource-constrained microcontrollers efficiently. We propose bit-serial weight pools, an end-to-end framework that includes network compression and acceleration of arbitrary sub-byte precision. The framework can achieve up to 8× compression compared to 8-bit networks by sharing a pool of weights across the entire network. We further propose a bit-serial lookup based software implementation that allows runtime-bitwidth tradeoff and is able to achieve more than 2.8× speedup and 7.5× storage compression compared to 8-bit weight pool networks, with less than 1% accuracy drop.

show abstract

Section: Lookup Table Based Vector Multiplication Accelerationmentioning

confidence: 99%

Bit-serial Weight Pools: Compression and Arbitrary Precision Execution of Neural Networks on Resource Constrained Processors

Li¹,

Gupta²

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Besides bitwise operations, DRAM PIM has been shown to significantly improve neural network computation inside memory. For example, by performing operations commonly found in convolutional network networks like the multiply-and-accumulate operation in memory, DRAM PIM can achieve significant speedup over conventional architectures [45]. operations such as addition and multiplication [1,23,43].…”

Section: Sing Drammentioning

confidence: 99%

“…Besides bitwise operations, DRAM PIM has been shown to significantly improve neural network computation inside memory. For example, by performing operations commonly found in convolutional network networks like the multiply-and-accumulate operation in memory, DRAM PIM can achieve significant speed-up over conventional architectures [45]. In order to extract even more performance improvements, [12] places single instruction, multiple data (SIMD) PEs adjacent to the sense amplifiers at the cost of higher area and power per bit of memory.…”

Section: Pim Using Drammentioning

confidence: 99%

A Survey of Resource Management for Processing-In-Memory and Near-Memory Processing Architectures

Khan

Pasricha

Kim

2020

JLPEA

View full text Add to dashboard Cite

Due to the amount of data involved in emerging deep learning and big data applications, operations related to data movement have quickly become a bottleneck. Data-centric computing (DCC), as enabled by processing-in-memory (PIM) and near-memory processing (NMP) paradigms, aims to accelerate these types of applications by moving the computation closer to the data. Over the past few years, researchers have proposed various memory architectures that enable DCC systems, such as logic layers in 3D-stacked memories or charge-sharing-based bitwise operations in dynamic random-access memory (DRAM). However, application-specific memory access patterns, power and thermal concerns, memory technology limitations, and inconsistent performance gains complicate the offloading of computation in DCC systems. Therefore, designing intelligent resource management techniques for computation offloading is vital for leveraging the potential offered by this new paradigm. In this article, we survey the major trends in managing PIM and NMP-based DCC systems and provide a review of the landscape of resource management techniques employed by system designers for such systems. Additionally, we discuss the future challenges and opportunities in DCC management.

show abstract

“…As a result, PuM architectures can provide high compute throughput by performing operations in a bulk parallel manner, often at the granularity of memory rows. Prior PuM works [70,72,74,75,79,82,84,96,97] propose mechanisms for the execution of bulk bitwise operations (e.g., bitwise MAJority,AND,OR,NOT) [72, 74, 78, 80, 82-85, 87, 91, 98] and bulk arithmetic operations [70,75,79,96,97]. However, these proposals have two important limitations: 1) the execution of some complex operations (e.g., multiplication, division) incurs high latency and energy consumption [75], and 2) other complex operations (e.g., exponentiation, trigonometric functions) are not even supported.…”

Section: Introductionmentioning

confidence: 99%

pLUTo: Enabling Massively Parallel Computation in DRAM via Lookup Tables

Ferreira¹,

Falcão

Gómez-Luna³

et al. 2022

2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO)

View full text Add to dashboard Cite

Data movement between the main memory and the processor is a key contributor to execution time and energy consumption in memory-intensive applications. This data movement bottleneck can be alleviated using Processing-in-Memory (PiM). One category of PiM is Processing-using-Memory (PuM), in which computation takes place inside the memory array by exploiting intrinsic analog properties of the memory device. PuM yields high performance and energy efficiency, but existing PuM techniques support a limited range of operations. As a result, current PuM architectures cannot efficiently perform some complex operations (e.g., multiplication, division, exponentiation) without large increases in chip area and design complexity.To overcome these limitations of existing PuM architectures, we introduce pLUTo (processing-using-memory with lookup table (LUT) operations), a DRAM-based PuM architecture that leverages the high storage density of DRAM to enable the massively parallel storing and querying of lookup tables (LUTs). The key idea of pLUTo is to replace complex operations with low-cost, bulk memory reads (i.e., LUT queries) instead of relying on complex extra logic.We evaluate pLUTo across 11 real-world workloads that showcase the limitations of prior PuM approaches and show that our solution outperforms optimized CPU and GPU baselines by an average of 713× and 1.2×, respectively, while simultaneously reducing energy consumption by an average of 1855× and 39.5×. Across these workloads, pLUTo outperforms state-of-the-art PiM architectures by an average of 18.3×. We also show that different versions of pLUTo provide different levels of flexibility and performance at different additional DRAM area overheads (between 10.2% and 23.1%). pLUTo's source code and all scripts required to reproduce the results of this paper are openly and fully available at https://github.com/CMU-SAFARI/pLUTo.

show abstract

pPIM: A Programmable Processor-in-Memory Architecture With Precision-Scaling for Deep Learning

Cited by 29 publications

References 9 publications

Bit-serial Weight Pools: Compression and Arbitrary Precision Execution of Neural Networks on Resource Constrained Processors

Bit-serial Weight Pools: Compression and Arbitrary Precision Execution of Neural Networks on Resource Constrained Processors

A Survey of Resource Management for Processing-In-Memory and Near-Memory Processing Architectures

pLUTo: Enabling Massively Parallel Computation in DRAM via Lookup Tables

Contact Info

Product

Resources

About