2014
DOI: 10.1016/j.sysarc.2013.11.008
|View full text |Cite
|
Sign up to set email alerts
|

Improving branch divergence performance on GPGPU with a new PDOM stack and multi-level warp scheduling

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2016
2016
2016
2016

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(3 citation statements)
references
References 31 publications
0
3
0
Order By: Relevance
“…BinarySearch even suffers performance degradations in the 2-wide configurations due to an increase in the IF stalls. 13 For the kernels of breadth-first search, the largest configurations only achieve 4.46% speed-up for BFS_1 and 7.95% for BFS_2 and this is only ∼1% more than the 2-wide configurations with single FUs.…”
Section: A Ilpmentioning
confidence: 95%
See 2 more Smart Citations
“…BinarySearch even suffers performance degradations in the 2-wide configurations due to an increase in the IF stalls. 13 For the kernels of breadth-first search, the largest configurations only achieve 4.46% speed-up for BFS_1 and 7.95% for BFS_2 and this is only ∼1% more than the 2-wide configurations with single FUs.…”
Section: A Ilpmentioning
confidence: 95%
“…This also allows threads to issue memory operations with high spatial locality resulting in data traffic optimization in the memory hierarchy. These constraints have little effect on highly-regular graphic shader programs, but throughput can dramatically decrease in the presence of control-flow with bespoke solutions proposed to alleviate thread divergence [12] [13]. System designers have looked into building systems with many cores that are not multi-threaded [14][15], but this approach still does not address the fact that not all problems can be solved effectively in the same manner.…”
Section: A Motivationmentioning
confidence: 99%
See 1 more Smart Citation