Proceedings of the 8th ACM International Conference on Computing Frontiers 2011
DOI: 10.1145/2016604.2016637
|View full text |Cite
|
Sign up to set email alerts
|

Bounding the effect of partition camping in GPU kernels

Abstract: Current GPU tools and performance models provide some common architectural insights that guide the programmers to write optimal code. We challenge and complement these performance models and tools, by modeling and analyzing a lesser known, but very severe performance pitfall, called Partition Camping, in NVIDIA GPUs. Partition Camping is caused by memory accesses that are skewed towards a subset of the available memory partitions, which may degrade the performance of GPU kernels by up to seven-fold. There is n… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
3
0
1

Year Published

2013
2013
2021
2021

Publication Types

Select...
4
3

Relationship

1
6

Authors

Journals

citations
Cited by 16 publications
(4 citation statements)
references
References 15 publications
0
3
0
1
Order By: Relevance
“…Similarly, the second kernel used 64 × 4 as the thread block size instead of the original 256 × 1 size. We chose 64 instead of 32 because the thread block size 32 × 4 causes partition camping in this kernel, which can degrade the kernel performance by as much as sevenfold .…”
Section: Resultsmentioning
confidence: 99%
“…Similarly, the second kernel used 64 × 4 as the thread block size instead of the original 256 × 1 size. We chose 64 instead of 32 because the thread block size 32 × 4 causes partition camping in this kernel, which can degrade the kernel performance by as much as sevenfold .…”
Section: Resultsmentioning
confidence: 99%
“…Further, we model CPU-GPU memory copy engine which we found is an important factor on L2 accesses and hit rate, since all DRAM accesses go through the L2, including CPU-GPU memory copies [16]. In order to reduce uneven accesses across memory partitions [17], we add an advanced partition indexing that xors the L2 channel bits with randomly selected bits from the higher row and lower bank bits [18]. In the memory system, we accurately model HBM.…”
Section: Methodsmentioning
confidence: 99%
“…Performance Analysis and Tuning: Researchers have proposed several techniques to analyze GPU performance from various aspects, including branching, degree of coalescing, race conditions, bank conflict, and partition camping [2], [5], [18]. They provide helpful information for the user to identify potential bottlenecks.…”
Section: Introductionmentioning
confidence: 99%