2019 IEEE/ACM 9th Workshop on Irregular Applications: Architectures and Algorithms (IA3) 2019
DOI: 10.1109/ia349570.2019.00014
|View full text |Cite
|
Sign up to set email alerts
|

Stretching Jacobi: Two-Stage Pivoting in Block-Based Factorization

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
1
1

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(3 citation statements)
references
References 21 publications
0
3
0
Order By: Relevance
“…Due to the higher bandwidth of registers, strictly following this idiom can lead to high speedups. The prime use case for this idiom are batched computations on small portions of data held in registers, e.g., batched GEMM or batched matrix factorizations [5,48]. In those examples, each thread inside a warp loads, e.g., a row of the warps' matrix and uses shuffle whenever it needs access to the row stored by another thread.…”
Section: The Warp Register Cache Idiommentioning
confidence: 99%
See 2 more Smart Citations
“…Due to the higher bandwidth of registers, strictly following this idiom can lead to high speedups. The prime use case for this idiom are batched computations on small portions of data held in registers, e.g., batched GEMM or batched matrix factorizations [5,48]. In those examples, each thread inside a warp loads, e.g., a row of the warps' matrix and uses shuffle whenever it needs access to the row stored by another thread.…”
Section: The Warp Register Cache Idiommentioning
confidence: 99%
“…The warp register cache idiom has become mainstream in the CUDA community [5,16,48,50], having been generalized into "task-parallel programming for warps" [8].…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation