2011 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation 2011
DOI: 10.1109/samos.2011.6045469
|View full text |Cite
|
Sign up to set email alerts
|

Breaking the bandwidth wall in chip multiprocessors

Abstract: In throughput-aware CMPs like GPUs and DSPs, software-managed streaming memory systems are an effective way to tolerate high latencies. E.g., the Cell/B.E. incorporates local memories, and data transfers to/from those memories are overlapped with computation using DMAs. In such designs, the latency of the memory system has little impact on performance; instead, memory bandwidth becomes critical. With the increase in the number of cores, conventional DRAMs no longer suffice to satisfy the bandwidth demand. Henc… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
9
0

Year Published

2013
2013
2018
2018

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(9 citation statements)
references
References 19 publications
0
9
0
Order By: Relevance
“…Stencil calculations perform global sweeps through data structures that are typically much larger than the available data caches. As a result, data from main memory often cannot be transferred fast enough to avoid stalling the computational units on modern microprocessors [74,18,12,70,66]. Reorganizing these computations to fit into the caches has principally focused on tiling optimizations that exploit locality by performing operations on cache-sized blocks of data in each processor before moving on to the next block [56].…”
Section: Stencil Computationsmentioning
confidence: 99%
See 4 more Smart Citations
“…Stencil calculations perform global sweeps through data structures that are typically much larger than the available data caches. As a result, data from main memory often cannot be transferred fast enough to avoid stalling the computational units on modern microprocessors [74,18,12,70,66]. Reorganizing these computations to fit into the caches has principally focused on tiling optimizations that exploit locality by performing operations on cache-sized blocks of data in each processor before moving on to the next block [56].…”
Section: Stencil Computationsmentioning
confidence: 99%
“…CMS has broad applicability because a wide variety of stencil-based kernels are memory bound [65,70,32,36]. Stencil-based kernels are also critical because they comprise the building blocks of applications ranging from image processing in consumer devices to the largest scale HPC applications such as climate modeling and fluid simulations.…”
Section: Operation Completion Timementioning
confidence: 99%
See 3 more Smart Citations