2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops &Amp; PhD Forum 2012
DOI: 10.1109/ipdpsw.2012.328
|View full text |Cite
|
Sign up to set email alerts
|

Performance Estimation of GPUs with Cache

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2013
2013
2020
2020

Publication Types

Select...
4
2
2

Relationship

1
7

Authors

Journals

citations
Cited by 18 publications
(7 citation statements)
references
References 8 publications
0
7
0
Order By: Relevance
“…Another cache model [15] is part of a complete GPU model, but assumes hit and miss rates to be known. Furthermore, other work has used reuse distance to analyse non-GPU multi-core and many-core workloads [6,17,18].…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Another cache model [15] is part of a complete GPU model, but assumes hit and miss rates to be known. Furthermore, other work has used reuse distance to analyse non-GPU multi-core and many-core workloads [6,17,18].…”
Section: Related Workmentioning
confidence: 99%
“…The figure confirms the hypothesis, as the number of varied set bits (final row) corresponds to the number of bits included in the hashing function counting from the log 2 of the stride. For example, with a stride of 2 12 and 128 loads, bits 12-18 are included, of which only 4 bits (13,14,15,17) are used in the computation of the set index. …”
Section: Associativity Micro-benchmarkmentioning
confidence: 99%
“…The applications are launched without considering any adaptation to architectural features of new GPUs. Its consequence is that the cache unaware data accesses generate a large number of cache misses [16]. We can expect much better performance if the applications and MR frameworks are tuned according to the principle of locality.…”
Section: Code Restructuringmentioning
confidence: 99%
“…These counts are taken from an in-house developed cache simulator and address trace generator. This cache simulator [16] has already been verified with DineroVI.…”
Section: Code Restructuringmentioning
confidence: 99%
“…A computing system comprises of a conventional CPU (Host) and at least one GPU (Device). The GPUs are massively parallel coprocessors/accelerators furnished with an extensive number of arithmetic execution units [7]. A CUDA source code comprises of various stages that are executed either on the CPU (Host) or a GPU (device).…”
Section: Introductionmentioning
confidence: 99%