2014
DOI: 10.1109/tc.2013.161
|View full text |Cite
|
Sign up to set email alerts
|

A Practical Data Classification Framework for Scalable and High Performance Chip-Multiprocessors

Abstract: State-of-the-art chip multiprocessor (CMP) proposals emphasize general optimizations designed to deliver computing power for many types of applications. Potentially, significant performance improvements that leverage application-specific characteristics such as data access behavior are missed by this approach. In this paper, we demonstrate how scalable and high-performance parallel systems can be built by classifying data accesses into different categories and treating them differently. We develop a novel comp… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2016
2016
2018
2018

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(5 citation statements)
references
References 33 publications
0
5
0
Order By: Relevance
“…Like our study, there exist some prior studies in the literature to classify data blocks as private or shared for different purposes in CMPs, such as reducing coherence overhead or the access latency to distributed caches. Hardavellas et al [17] and Li et al [23] tried to categorize data blocks and keep private data blocks in the nonuniform distributed shared cache [nonuniform cache access (NUCA)] slice of the requesting core, where the access latency depends on the physical distance between the core demanding data and the L2 cache slice storing the data. The primary aim of these two studies is to reduce NUCA access latency by employing intelligent placement, migration, and replication mechanisms.…”
Section: Related Work and Motivationmentioning
confidence: 99%
See 1 more Smart Citation
“…Like our study, there exist some prior studies in the literature to classify data blocks as private or shared for different purposes in CMPs, such as reducing coherence overhead or the access latency to distributed caches. Hardavellas et al [17] and Li et al [23] tried to categorize data blocks and keep private data blocks in the nonuniform distributed shared cache [nonuniform cache access (NUCA)] slice of the requesting core, where the access latency depends on the physical distance between the core demanding data and the L2 cache slice storing the data. The primary aim of these two studies is to reduce NUCA access latency by employing intelligent placement, migration, and replication mechanisms.…”
Section: Related Work and Motivationmentioning
confidence: 99%
“…Moreover, the private data detection mechanisms in this paper are quite different from ones used in these studies. While in the study [17] cache access patterns are classified via the OS, Li et al [23] try to detect private data offline with compiler assistance. As mentioned in some previous studies [3], [10], we believe that compared to offline, we can detect more private data blocks at runtime.…”
Section: Related Work and Motivationmentioning
confidence: 99%
“…Some prior studies exploited private data detection to enable high performance for many-core architecture by mitigating the overhead of managing coherence. The detection of private data might be done offline with compiler assistance [15]. Although, this approach does not incur any runtime overhead or any extra hardware, however, there is a limitation on the amount of the private data which can be statically detected.…”
Section: Key Observationmentioning
confidence: 99%
“…Morrigan sets the access bit of all prefetched pages since the x86 memory consistency model dictates that all TLB prefetches are obliged to do so [48]. Therefore, Morrigan does not complicate TLB shootdowns [53,57,88,181,288] because the information about the prefetched instruction PTEs is conveyed to the OS as usual. Regarding the impact on the page replacement policy, a prefetch is harmful to the page replacement policy if it is evicted from the TLB PB without providing any hit and does not belong to the active footprint of the application.…”
Section: Page Replacement Policy and Tlb Shootdownsmentioning
confidence: 99%
“…The PPM module does not introduce new security vulnerabilities since it solely leverages the page size information which is part of the address translation metadata available after the TLB access. An adversary could not use events such as context-switches and TLB shootdowns [53,57,88,181,288] to violate the security guarantees of PPM; this would be possible if PPM was storing the page size information into a data structure and that data structure was not flushed upon TLB shootdowns and context switches.…”
Section: Securitymentioning
confidence: 99%