2009 Design, Automation &Amp; Test in Europe Conference &Amp; Exhibition 2009
DOI: 10.1109/date.2009.5090768
|View full text |Cite
|
Sign up to set email alerts
|

Adaptive prefetching for shared cache based chip multiprocessors

Abstract: Chip multiprocessors (CMPs) present a unique scenario for software data prefetching with subtle tradeoffs between memory bandwidth and performance. In a shared L2 based CMP, multiple cores compete for the shared on-chip cache space and limited off-chip pin bandwidth. Purely software based prefetching techniques tend to increase this contention, leading to degradation in performance. In some cases, prefetches can become harmful by kicking out useful data from the shared cache whose next usage is earlier than th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
14
0

Year Published

2012
2012
2017
2017

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(14 citation statements)
references
References 23 publications
0
14
0
Order By: Relevance
“…Unpredictable memory access streams make extracting benefits from memory prefetching difficult [37,57]. Depending on the access pattern, only 14%-97% of memory bandwidth can actually be utilized [57].…”
Section: Memory Access Streams and Efficiencymentioning
confidence: 99%
See 2 more Smart Citations
“…Unpredictable memory access streams make extracting benefits from memory prefetching difficult [37,57]. Depending on the access pattern, only 14%-97% of memory bandwidth can actually be utilized [57].…”
Section: Memory Access Streams and Efficiencymentioning
confidence: 99%
“…This is done with local independent hardware prefetch [37] or cache fill streams for a cache-coherent CMP, a list of outstanding load-store requests for a massively multithreaded architecture like a GPU, or via a sequence of DMA requests for a local store architecture like STI Cell [62]. In all of these cases, requests are sent independently over an unpredictable network and thus arrive in nearly random order to memory [76,69].…”
Section: Memory Access Streams and Efficiencymentioning
confidence: 99%
See 1 more Smart Citation
“…This is done with local independent hardware prefetch [19] or cache fill streams for a cache-coherent CMP, a list of outstanding load-store requests for a massively multithreaded architecture like a GPU, or via a sequence of DMA requests for a local store architecture like STI Cell [34]. In all of these cases, requests are sent independently over an unpredictable network and thus arrive in nearly random order to memory [38,42].…”
Section: Memory Access Streams and Efficiencymentioning
confidence: 99%
“…Memory prefetching techniques focus on reducing latency and offer little benefit in systems that are bound by memory bandwidth. Prefetching techniques typically perform predictions independently at each processor and thus create out-oforder access patterns [19].…”
Section: Related Workmentioning
confidence: 99%