2014
DOI: 10.1145/2692916.2555266
|View full text |Cite
|
Sign up to set email alerts
|

In-place transposition of rectangular matrices on accelerators

Abstract: Matrix transposition is an important algorithmic building block for many numeric algorithms such as FFT. It has also been used to convert the storage layout of arrays. With more and more algebra libraries offloaded to GPUs, a high performance in-place transposition becomes necessary. Intuitively, in-place transposition should be a good fit for GPU architectures due to limited available on-board memory capacity and high throughput. However, direct application of CPU in-place transposition algorithms lacks the a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
18
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
6

Relationship

1
5

Authors

Journals

citations
Cited by 12 publications
(18 citation statements)
references
References 18 publications
0
18
0
Order By: Relevance
“…However, these papers do not target TLB related issues which arise when large data structures are processed by the memory bound algorithms. Matrix transposition solutions presented in [3,24,26] include many GPU specific optimisations, yet they also do not consider the impact of the TLB on algorithm performance.…”
Section: Prior Artmentioning
confidence: 99%
“…However, these papers do not target TLB related issues which arise when large data structures are processed by the memory bound algorithms. Matrix transposition solutions presented in [3,24,26] include many GPU specific optimisations, yet they also do not consider the impact of the TLB on algorithm performance.…”
Section: Prior Artmentioning
confidence: 99%
“…It allows the rows and columns to be operated on independently, reducing work complexity and auxiliary space. Catanzaro et al compare their implementation to our original 3-stage approach [10], which is improved in the present work. Fig.…”
Section: In-place and Out-of-place Transposition For Gpusmentioning
confidence: 99%
“…In [10] we showed that the 4-stage approach presents some issues that limit its throughput on GPUs. For instance, the transposition 1000 !…”
Section: Full Transposition As a Sequence Of Elementary Tiled Transpomentioning
confidence: 99%
See 2 more Smart Citations