2009 IEEE International Symposium on Workload Characterization (IISWC) 2009
DOI: 10.1109/iiswc.2009.5306792
|View full text |Cite
|
Sign up to set email alerts
|

A communication characterisation of Splash-2 and Parsec

Abstract: Abstract

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
118
1

Year Published

2012
2012
2021
2021

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 166 publications
(121 citation statements)
references
References 15 publications
2
118
1
Order By: Relevance
“…We utilized the Parsec benchmarks [10] with 'simsmall' input data run with 32 cores (arranged in a 8 × 4 mesh), the maximum available in our configuration. As discussed in [7] the spatial patterns of the application composing this benchmark suite do not present any noticeable 'hot spot'. We will see later that this may happen with the transactional memory applications.…”
Section: B Directory-based Cache Coherencymentioning
confidence: 99%
“…We utilized the Parsec benchmarks [10] with 'simsmall' input data run with 32 cores (arranged in a 8 × 4 mesh), the maximum available in our configuration. As discussed in [7] the spatial patterns of the application composing this benchmark suite do not present any noticeable 'hot spot'. We will see later that this may happen with the transactional memory applications.…”
Section: B Directory-based Cache Coherencymentioning
confidence: 99%
“…Their work requires the generation of memory traces to guide data mapping for future executions of the applications, which may lead to a high overhead [3]. A similar technique is used in Marathe and Mueller [17] to perform data mapping dynamically.…”
Section: Related Workmentioning
confidence: 99%
“…For the random mapping, we randomly generated a thread and data mapping for each execution. For the Oracle mapping, we generated traces of all memory accesses for each application and performed an analysis of the sharing and page usage patterns, similar to [3]. Autopin was executed with 5 mappings: the Oracle mapping and 4 random mappings.…”
Section: Comparison To Related Workmentioning
confidence: 99%
“…Further, the overhead of communication between threads is often reduced, because of improved physical proximity between processing elements. For example, Barrow-Williams et al [10] use a 10-cycle latency for shared L2 cache access in their characterization of communication patterns of parallel benchmarks on modern multicore architectures.…”
Section: Changing Cost Modelsmentioning
confidence: 99%