Making Better Use of Processing-in-Memory Through Potential-Based Task Offloading

Kim, Byoung-Hak; Rhee, Chae Eun

doi:10.1109/access.2020.2983432

Cited by 2 publications

(23 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For DCC architectures, solutions can be divided into two main categories: (1) PIM systems, which perform computations using special circuitry inside the memory module or by taking advantage of particular aspects of the memory itself, e.g., simultaneous activation of multiple DRAM rows for logical operations [1,[11][12][13][14][15][16][17][18][19][20][21][22][23][24][25]; (2) NMP systems, which perform computations on a PE placed close to the memory module, e.g., CPU or GPU cores placed on the logic layer of 3D-stacked memory [26][27][28][29][30][31][32][33][34][35][36][37][38][39][40][41][42]. For the purposes of this survey, we classify systems that use logic layers in 3D-stacked memories as NMP systems, as these logic layers are essentially computational cores that are near the memory stack (directly underneath it).…”

Section: Data-centric Computing Architecturesmentioning

confidence: 99%

“…Programmable logic: Programmable logic PEs can include general purpose processor cores such as CPUs [31,40,[70][71][72], GPUs [26,27,38,73,74], and accelerated processing units (APU) [75] that can execute complex workloads. These cores are usually trimmed down (fewer computation units, less complex cache hierarchies without L2/L3 caches, or lower operating frequencies) from their conventional counterparts due to power, area, and thermal constraints.…”

Section: J Low Power Electron Appl 2020 10 X For Peer Review 7 Omentioning

confidence: 99%

“…Offloading can be performed at different granularities, e.g., instructions (including small groups of instructions) [1,13,16,19,24,25,28,32,37,39,40,42,57,91,92], threads [71], Nvidia's CUDA blocks/warps [27,29], kernels [26], and applications [38,41,73,74]. Instruction-level offloading is often used with a fixed-function accelerator and PIM systems [1,13,16,19,24,25,28,29,32,37,39,42,57,92]. For example, [42] offloads atomic instructions at instruction-level granularity to a fixed-function near-memory graph accelerator.…”

Section: Data Offloading Granularitymentioning

confidence: 99%

“…The most common optimization knobs in DCCs include selecting offloading workloads for memory, selecting the most suitable PE in/near memory, or the timing of executing selected offloads. To implement the policy, management techniques have employed code annotation [1,13,16,19,24,25,28,31,32,37,40,57,91,95], compiler-based code analysis [27,39,40,70,92,96], and online heuristics [27][28][29]38,71,72,74]. Table 1 classifies prominent works based on these attributes.…”

Section: Resource Management Of Data-centric Computing Systemsmentioning

confidence: 99%

“…Overall, the locality aware management policy proves to be the best of the three approaches and demonstrates the importance of correctly deciding which instructions to offload based on locality. Another useful metric to consider when making offloading decisions is the expected memory bandwidth-saving from the offloading [38,42,94]. The speedup achieved by a Host-Only, all offloaded (PIM-Only), and low-locality offload (Locality-Aware) approaches in a DRAM NMP system for different application input sizes [28].…”

Section: Resource Management Of Data-centric Computing Systemsmentioning

confidence: 99%

See 4 more Smart Citations

A Survey of Resource Management for Processing-In-Memory and Near-Memory Processing Architectures

Khan

Pasricha

Kim

2020

JLPEA

View full text Add to dashboard Cite

Due to the amount of data involved in emerging deep learning and big data applications, operations related to data movement have quickly become a bottleneck. Data-centric computing (DCC), as enabled by processing-in-memory (PIM) and near-memory processing (NMP) paradigms, aims to accelerate these types of applications by moving the computation closer to the data. Over the past few years, researchers have proposed various memory architectures that enable DCC systems, such as logic layers in 3D-stacked memories or charge-sharing-based bitwise operations in dynamic random-access memory (DRAM). However, application-specific memory access patterns, power and thermal concerns, memory technology limitations, and inconsistent performance gains complicate the offloading of computation in DCC systems. Therefore, designing intelligent resource management techniques for computation offloading is vital for leveraging the potential offered by this new paradigm. In this article, we survey the major trends in managing PIM and NMP-based DCC systems and provide a review of the landscape of resource management techniques employed by system designers for such systems. Additionally, we discuss the future challenges and opportunities in DCC management.

show abstract