2014 IEEE Symposium on Computer Applications and Communications 2014
DOI: 10.1109/scac.2014.10
|View full text |Cite
|
Sign up to set email alerts
|

A Performance Prediction Model for Memory-Intensive GPU Kernels

Abstract: Commodity graphic processing units (GPUs) have rapidly evolved to become high performance accelerators for data-parallel computing through a large array of processing cores and the CUDA programming model with a C-like interface. However, optimizing an application for maximum performance based on the GPU architecture is not a trivial task for the tremendous change from conventional multi-core to the manycore architectures. Besides, the GPU vendors do not disclose much detail about the characteristics of the GPU… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2016
2016
2024
2024

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 12 publications
0
2
0
Order By: Relevance
“…Also, array access affects how memory is accessed, which is critical for performance diagnosis [44]. The performance of memory in GPUs relies on the memory access pattern that describes how data indexes are referred to by consecutive threads within a warp [33]. For example, coalesced memory access improves the performance of a program as a warp will not access multiple memory transactions [33].…”
Section: Source Code Analysismentioning
confidence: 99%
See 1 more Smart Citation
“…Also, array access affects how memory is accessed, which is critical for performance diagnosis [44]. The performance of memory in GPUs relies on the memory access pattern that describes how data indexes are referred to by consecutive threads within a warp [33]. For example, coalesced memory access improves the performance of a program as a warp will not access multiple memory transactions [33].…”
Section: Source Code Analysismentioning
confidence: 99%
“…The performance of memory in GPUs relies on the memory access pattern that describes how data indexes are referred to by consecutive threads within a warp [33]. For example, coalesced memory access improves the performance of a program as a warp will not access multiple memory transactions [33]. Also, if the cache is not aligned in a CUDA program, it affects the performance by 32 times more than cache-aligned [54].…”
Section: Source Code Analysismentioning
confidence: 99%