2021
DOI: 10.1109/access.2021.3110993
|View full text |Cite
|
Sign up to set email alerts
|

DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement Bottlenecks

Abstract: Data movement between the CPU and main memory is a first-order obstacle against improving performance, scalability, and energy efficiency in modern systems. Computer systems employ a range of techniques to reduce overheads tied to data movement, spanning from traditional mechanisms (e.g., deep multi-level cache hierarchies, aggressive hardware prefetchers) to emerging techniques such as Near-Data Processing (NDP), where some computation is moved close to memory. Prior NDP works investigate the root causes of d… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
32
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
3
2

Relationship

3
5

Authors

Journals

citations
Cited by 53 publications
(32 citation statements)
references
References 390 publications
(281 reference statements)
0
32
0
Order By: Relevance
“…We focus on three characteristics of NDP architectures that are of particular importance in the synchronization context. First, NDP architectures typically do not have a shared level of cache memory [8, 19, 25, 38, 42-46, 49, 55, 67, 98, 110, 111, 113, 119, 155, 158], since the NDP-suited workloads usually do not benefit from deep cache hierarchies due to their poor locality [33,43,133,143]. Second, NDP architectures do not typically support conventional hardware cache coherence protocols [8,19,25,38,[42][43][44][45]49,55,67,82,98,111,119,155,158], because they would add area and traffic overheads [46,143], and would incur high complexity and latency [4], limiting the benefits of NDP.…”
Section: Memory Arraysmentioning
confidence: 99%
See 1 more Smart Citation
“…We focus on three characteristics of NDP architectures that are of particular importance in the synchronization context. First, NDP architectures typically do not have a shared level of cache memory [8, 19, 25, 38, 42-46, 49, 55, 67, 98, 110, 111, 113, 119, 155, 158], since the NDP-suited workloads usually do not benefit from deep cache hierarchies due to their poor locality [33,43,133,143]. Second, NDP architectures do not typically support conventional hardware cache coherence protocols [8,19,25,38,[42][43][44][45]49,55,67,82,98,111,119,155,158], because they would add area and traffic overheads [46,143], and would incur high complexity and latency [4], limiting the benefits of NDP.…”
Section: Memory Arraysmentioning
confidence: 99%
“…Recent research demonstrates the benefits of NDP for parallel applications, e.g., for genome analysis [23,84], graph processing [8,9,20,21,112,155,158], databases [20,38], security [54], pointer-chasing workloads [25,60,67,99], and neural networks [19,45,82,98]. In general, these applications exhibit high parallelism, low operational intensity, and relatively low cache locality [15,16,33,50,133], which make them suitable for NDP.…”
Section: Introductionmentioning
confidence: 99%
“…Since main memory is a growing system performance and energy bottleneck [12,39,58,100,103,107,111,134,149,153,155], a RowHammer mitigation mechanism should exhibit acceptable performance and energy overheads at low area cost when configured for more vulnerable DRAM chips.…”
Section: Scaling With Increasing Rowhammer Vulnerabilitymentioning
confidence: 99%
“…The key question in this approach is which functions in an application should be offloaded for PNM acceleration. Several recent works tackle this question for various applications, e.g., mobile consumer workloads [7], GPGPU workloads [86,87], graph processing and in-memory database workloads [62,179], and a wide variety of workloads from many domains [16]. We will discuss function-level PNM acceleration of mobile consumer workloads in this section, focusing on our recent work on the topic [7].…”
Section: Function-level Pnm Acceleration Of Mobile Consumer Workloadsmentioning
confidence: 99%
“…Across all of these systems,the data working set sizes of modern applications are rapidly growing, while the need for fast analysis of such data is increasing. Thus, main memory is becoming an increasingly significant bottleneck across a wide variety of computing systems and applications [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16]. Alleviating the main memory bottleneck requires the memory capacity, energy, cost, and performance to all scale in an efficient manner across technology generations.…”
Section: Introductionmentioning
confidence: 99%