2011 IEEE 17th International Symposium on High Performance Computer Architecture 2011
DOI: 10.1109/hpca.2011.5749714
|View full text |Cite
|
Sign up to set email alerts
|

Thread block compaction for efficient SIMT control flow

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

2
134
1
1

Year Published

2012
2012
2018
2018

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 137 publications
(138 citation statements)
references
References 16 publications
2
134
1
1
Order By: Relevance
“…In this paper we address a significant issue with previously proposed compaction mechanisms [8,19] that hinders their effectiveness. In order to identify candidates for compaction, hardware stalls all warps within a CTA on any potentially divergent branch until all warps reach the branch point.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…In this paper we address a significant issue with previously proposed compaction mechanisms [8,19] that hinders their effectiveness. In order to identify candidates for compaction, hardware stalls all warps within a CTA on any potentially divergent branch until all warps reach the branch point.…”
Section: Introductionmentioning
confidence: 99%
“…This is because only a single instruction is issued to all SIMD lanes, implying that only a subset of the lanes should actually execute operations and commit results. Recent research has shown that the impact of this control divergence problem can be reduced by dynamically forming SIMD-instructions from large collections of threads [8,19]. These collections of threads are called cooperating thread arrays (CTAs) or thread blocks by NVIDIA's CUDA [21] and workgroups by OpenCL [3].…”
Section: Introductionmentioning
confidence: 99%
“…Un mécanisme approchant a été proposé en 2011 dans le cadre des architectures SIMT (W. Fung, Aamodt, 2011). Dans cette proposition, la logique de gestion de la divergence est réalisée en matériel plutôt qu'en logiciel, et repose sur des masques plutôt que des PC multiples.…”
Section: C* Pour Hypercubeunclassified
“…Different solutions have been proposed at the software [41] and hardware levels [12][13] [35]. Here, we propose to use the scalar unit to eliminate control divergence at run time.…”
Section: Collaborative Execution Paradigm Ii: Control Divergencementioning
confidence: 99%