2015
DOI: 10.1007/978-3-658-10113-8
|View full text |Cite
|
Sign up to set email alerts
|

Automatic SIMD Vectorization of SSA-based Control Flow Graphs

Abstract: First and foremost, I want to express my gratitude towards my advisor Prof. Dr. Sebastian Hack. Not only did he give me the opportunity to pursue a PhD in his group, he also gave me all the necessary freedom, boundaries, and advice to finally succeed. Furthermore, I would like to thank Prof. Dr. Dr. h.c. mult. Reinhard Wilhelm for reviewing this thesis, and Prof. Dr. Christoph Weidenbach and Dr. Jörg Herter for serving on my committee. I also want to offer my special thanks to Prof. Dr. Philipp Slusallek. He f… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
10
0

Year Published

2016
2016
2024
2024

Publication Types

Select...
5
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 14 publications
(10 citation statements)
references
References 63 publications
0
10
0
Order By: Relevance
“…Post-Dominator Reconvergence. A large body of earlier work assumes that diverged threads will not reconverge before the immediate post-dominator (IPD) [Coutinho et al 2011;Farrell and Kieronska 1996;Habermaier and Knapp 2012;Karrenberg 2015]. IPD reconvergence became popular through its adoption in NVIDIA GPUs starting with Tesla [Lindholm et al 2008].…”
Section: Related Workmentioning
confidence: 99%
“…Post-Dominator Reconvergence. A large body of earlier work assumes that diverged threads will not reconverge before the immediate post-dominator (IPD) [Coutinho et al 2011;Farrell and Kieronska 1996;Habermaier and Knapp 2012;Karrenberg 2015]. IPD reconvergence became popular through its adoption in NVIDIA GPUs starting with Tesla [Lindholm et al 2008].…”
Section: Related Workmentioning
confidence: 99%
“…Reference [45] detects computations in a program that may be performed simultaneously in SIMD lanes and optimizes for vector register usage to minimize memory traffic. The Region Vectorizer [28] is the first open source vectorizer that works on the whole-function level. After converting a whole function to an SSA-based representation, execution flows are traced, masks generated, and loops reordered.…”
Section: Related Workmentioning
confidence: 99%
“…Earlier work [Eichenberger et al 2004;Nuzman et al 2006;Ren et al 2006;Sreraman and Govindarajan 2000] applied strip mining to exploit parallelism across consecutive loop iterations by adopting the technology developed for vector machines [Zima and Chapman 1991]. Over the years, several problems have been addressed, including non-unit array accesses [Nuzman et al 2006], nonunified data alignment [Eichenberger et al 2004;Larsen et al 2002], virtual vectors [Wu et al 2005], polyhedral transformations [Kong et al 2013;Trifunovic et al 2009], branch divergence [Shin 2007;Sujon et al 2013], function calls [Karrenberg and Hack 2011;Karrenberg 2015], and split vectorization [Nuzman et al 2011]. Loop-level vectorization works well for loops with regular computations that exhibit little dependences but often poorly for loops with loop-carried dependences (caused by IAA and ICF in Table II), as evaluated here.…”
Section: Related Workmentioning
confidence: 99%