2009 21st International Symposium on Computer Architecture and High Performance Computing 2009
DOI: 10.1109/sbac-pad.2009.12
|View full text |Cite
|
Sign up to set email alerts
|

Analysis of Performance Dependencies in NUCA-Based CMP Systems

Abstract: Improvements in semiconductor nanotechnology have continuously provided a crescent number of faster and smaller per-chip transistors. Consequent classical techniques for boosting performance, such as the increase of clock frequency and the amount of work performed at each clock cycle, can no longer deliver to significant improvement due to energy constrains and wire delay effects. As a consequence, designers interests have shifted toward the implementation of systems with multiple cores per chip (Chi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
19
0

Year Published

2010
2010
2018
2018

Publication Types

Select...
3
2
1

Relationship

3
3

Authors

Journals

citations
Cited by 10 publications
(19 citation statements)
references
References 22 publications
0
19
0
Order By: Relevance
“…Again, we use the name in the first column of the table to refer to the corresponding mix of 4 benchmarks. 3 Chart 5. Average latencies for DRAMs.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Again, we use the name in the first column of the table to refer to the corresponding mix of 4 benchmarks. 3 Chart 5. Average latencies for DRAMs.…”
Section: Methodsmentioning
confidence: 99%
“…This configuration can be further improved with more intelligent placement of pages in 3D-DRAM and PCM, or migrate pages between these devices. It should be noted that such migrations have some similarities with NUCA caches such as [3,4]. However, unlike NUCAs, pages cannot be replicated in 3D DRAM and PCM.…”
Section: Comparisonsmentioning
confidence: 99%
“…Instead, in this paper we propose a solution that makes block migration effective in boosting performance while maintaining scalability, since the limited replication mechanism of Re-NUCA allows at most two L2 copies of a shared block without taking care of how many processors are sharing it. As the replication mechanism we introduce extends the behaviors of a classical D-NUCA, we considered the migration scheme with the FMA and Collector optimizations proposed in [29]. The architecture adopts a directory version of MESI as the baseline coherence protocol, in which the directory is non-blocking [19,20,21] and distributed.…”
Section: Related Workmentioning
confidence: 99%
“…During the time interval in which a block is moving from the source to the destination bank, it is important to manage subsequent requests that could be issued by other L1 caches. A false miss [7,29] is a race condition that can arise when a subsequent request for a migrating block is received from the sender after the block has been issued (and the corresponding cache line has been deallocated), and by the receiver before the block has arrived. Such condition results in an extra off-chip access even if the referred block is actually cached.…”
Section: A D-nuca Basics: the Fma Protocolmentioning
confidence: 99%
“…This policy, called Dynamic NUCA or D-NUCA, achieves lower access latencies than S-NUCA but complicates the process of requesting a block to the LLC, leading to a tradeoff between access time and NoC traffic since all the banks of a bank set must be accessed, leading to either high latency (sequential search) or more traffic (parallel search). Furthermore, it was proposed for a single-core system, and its extension to CMPs does not show the expected performance improvements due to various issues that have to be managed when multiple cores are sharing the same D-NUCA, such as the ping-pong behavior and the race conditions which lead to false and multiple misses [11], [26].…”
Section: Block Mapping Policies In Shared Banked Llcsmentioning
confidence: 99%