2014 IEEE Intl Conf on High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security 2014
DOI: 10.1109/hpcc.2014.30
|View full text |Cite
|
Sign up to set email alerts
|

LU Factorization of Small Matrices: Accelerating Batched DGETRF on the GPU

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
23
0

Year Published

2015
2015
2022
2022

Publication Types

Select...
5
3
1

Relationship

1
8

Authors

Journals

citations
Cited by 42 publications
(23 citation statements)
references
References 2 publications
0
23
0
Order By: Relevance
“…The principle is to have a for loop iterating over the matrices, and within this loop, compute the factorization of the matrix. This is also the approach used in [5], [6].…”
Section: Batchmentioning
confidence: 99%
“…The principle is to have a for loop iterating over the matrices, and within this loop, compute the factorization of the matrix. This is also the approach used in [5], [6].…”
Section: Batchmentioning
confidence: 99%
“…The use of batched algorithms [8] to launch multiple network integration kernels for each GPU streaming multiprocessor.…”
Section: Leveraging Modern Hardware: Gpu Accelerationmentioning
confidence: 99%
“…Some vendors have started to provide some batched functionalities in their numerical libraries (e.g., NVIDIA's CUBLAS and Intel's Math Kernel Library [MKL]). Additionally, some open-source libraries from the HPC community (e.g., the Matrix Algebra on GPU and Multicore Architectures [MAGMA] library [37]) have also started to deliver batched routines [11], [12], [18]. While performance has been improving with these contributions, there is still a lack of understanding of how to design, implement, analyze, and optimize batched routines to exploit modern architectures at full efficiency.…”
Section: Introductionmentioning
confidence: 99%