Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques 2018
DOI: 10.1145/3243176.3243188
|View full text |Cite
|
Sign up to set email alerts
|

In-DRAM near-data approximate acceleration for GPUs

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0
1

Year Published

2020
2020
2023
2023

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 23 publications
(11 citation statements)
references
References 69 publications
0
10
0
1
Order By: Relevance
“…通用的近数据计算架构方面代表性工作有: AMD Research 的 TOP-PIM [15] , Carnegie Mellon Univeristy 的 TOM [16] , University of Wisconsin-Madison 的 DRAMA [28] 和 NDA [7] , Seoul National University 的 PEI [13] , IBM Research 的 AMC (active memory cube) [29] 和基于多核 CPU 的近数据计 算系统 [30] , Stanford University 的 HRL [14] , Brown University 为近数据计算设计的并发数据结构 [31] , Georgia Institute of Technology 的 AxRAM [32] , 以及 Chinese Academy of Sciences 的 proPRAM [33] , 具 体如下. 图 5 一个 NDC 系统的例子 [15] Figure 5 An example of NDC system [15] Main GPU [16] Figure 6 The architecture of TOM [16] 量的限制, 充分分析大量应用在性能和能耗方面的特征.…”
Section: 通用的近数据计算架构unclassified
“…通用的近数据计算架构方面代表性工作有: AMD Research 的 TOP-PIM [15] , Carnegie Mellon Univeristy 的 TOM [16] , University of Wisconsin-Madison 的 DRAMA [28] 和 NDA [7] , Seoul National University 的 PEI [13] , IBM Research 的 AMC (active memory cube) [29] 和基于多核 CPU 的近数据计 算系统 [30] , Stanford University 的 HRL [14] , Brown University 为近数据计算设计的并发数据结构 [31] , Georgia Institute of Technology 的 AxRAM [32] , 以及 Chinese Academy of Sciences 的 proPRAM [33] , 具 体如下. 图 5 一个 NDC 系统的例子 [15] Figure 5 An example of NDC system [15] Main GPU [16] Figure 6 The architecture of TOM [16] 量的限制, 充分分析大量应用在性能和能耗方面的特征.…”
Section: 通用的近数据计算架构unclassified
“…In an NMP system with 3D memory cubes, the processing capability is in the base logic die under a stack of DRAM layers to utilize the ample internal bandwidth [5]. Later research also proposes near-bank processing with logic near memory banks in the same DRAM layer to exploit even higher bandwidth [20,21], such as FIMDRAM [22] announced recently by Samsung. Recent proposals [23,24,25,26,27] have also explored augmenting traditional DIMMs with computation in the buffer die to provide low-cost but bandwidth-limited NMP solutions.…”
Section: Near-memory Processingmentioning
confidence: 99%
“…For workloads suffering from either the limited DRAM bandwidth or the long DRAM access latency on GPU, nearbank computing is a promising architecture to alleviate these performance bottlenecks because of both abundant bank-level memory bandwidth and reduced memory access latency. However, prior near-bank computing accelerators [3], [23], [67], [76] are domain-customized, since they have simple data paths, application-specific mapping strategies, and inefficient general purpose programming language support. The lack of programmability for these accelerators confines them to a niche application market, adding non-recurring engineering costs in manufacturing.…”
Section: Motivationmentioning
confidence: 99%
“…This solution provides a mediocre bandwidth improvement because intra-stack memory accesses are still bounded by the limited number of through-siliconvias (TSVs) between memory dies and the base logic die. To overcome this bandwidth bottleneck of TSVs, recent nearbank accelerators [3], [23], [67], [76] further move simple arithmetic units closer to the DRAM banks to harvest the abundant bank-internal bandwidth (around 10× w.r.t. processon-logic-die solution [23]).…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation