Proceedings of the 40th Annual International Symposium on Computer Architecture 2013
DOI: 10.1145/2485922.2485954
|View full text |Cite
|
Sign up to set email alerts
|

SIMD divergence optimization through intra-warp compaction

Abstract: SIMD execution units in GPUs are increasingly used for high performance and energy efficient acceleration of general purpose applications. However, SIMD control flow divergence effects can result in reduced execution efficiency in a class of GPGPU applications, classified as divergent applications. Improving SIMD efficiency, therefore, has the potential to bring significant performance and energy benefits to a wide range of such data parallel applications.Recently, the SIMD divergence problem has received incr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2014
2014
2023
2023

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 27 publications
(8 citation statements)
references
References 22 publications
0
8
0
Order By: Relevance
“…The main issue with predicated execution in SIMD architectures is its low energy efficiency. Measured mask density 2 is between 18-20% on typical benchmarks [23], [24], [25]. This means that sparse predicated masks are common on modern codes.…”
Section: The Divergence Control Flow Problemmentioning
confidence: 99%
See 3 more Smart Citations
“…The main issue with predicated execution in SIMD architectures is its low energy efficiency. Measured mask density 2 is between 18-20% on typical benchmarks [23], [24], [25]. This means that sparse predicated masks are common on modern codes.…”
Section: The Divergence Control Flow Problemmentioning
confidence: 99%
“…The hard timeout policy is implicit in every scenario. In the x-axis the number of cycles for each timeout policy control flow divergence [23], [24] does. Moreover, Vaidya et al [24] also demonstrate that the true-value position inside the mask register leads to no variability in performance.…”
Section: Benchmarksmentioning
confidence: 99%
See 2 more Smart Citations
“…To solve the problem of low performance of parallel data calculations [12] in industrial power applications, it is proposed to add a Vector processor hardware implementation named VPU in the TS800, which can support adaptive controllers Reinforcement learning and learningbased [13] underlying algorithm requirements. This design can support FFT and IFT of 64~4096 point [14].…”
Section: Introduction Risc-vmentioning
confidence: 99%