2014
DOI: 10.1016/j.parco.2014.03.001
|View full text |Cite
|
Sign up to set email alerts
|

Adaptive block size for dense QR factorization in hybrid CPU–GPU systems via statistical modeling

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
3
0

Year Published

2015
2015
2023
2023

Publication Types

Select...
6

Relationship

0
6

Authors

Journals

citations
Cited by 9 publications
(3 citation statements)
references
References 10 publications
0
3
0
Order By: Relevance
“…Heterogeneous CPU-accelerator systems have recently been widely used in high-performance computing and cloud computing due to their advantages of high performance and low power consumption. Many works have focused on utilizing both CPUs and accelerators to accelerate solving a specific application, such as matrix multiplication [1], sparse matrixvector multiplication [2], QR factorization [3], Cholesky factorization [4], branch-and-bound algorithm [5], Smith-Waterman algorithm [6], subset-sum problem [7], particle swarm optimization [8], graph processing [9], range query [10], computational fluid dynamics [11], and atmospheric numerical simulation [12]. These works demonstrate that the CPU-accelerator co-processing yields better performance than the CPU-only execution or accelerator-only execution.…”
Section: Introductionmentioning
confidence: 99%
“…Heterogeneous CPU-accelerator systems have recently been widely used in high-performance computing and cloud computing due to their advantages of high performance and low power consumption. Many works have focused on utilizing both CPUs and accelerators to accelerate solving a specific application, such as matrix multiplication [1], sparse matrixvector multiplication [2], QR factorization [3], Cholesky factorization [4], branch-and-bound algorithm [5], Smith-Waterman algorithm [6], subset-sum problem [7], particle swarm optimization [8], graph processing [9], range query [10], computational fluid dynamics [11], and atmospheric numerical simulation [12]. These works demonstrate that the CPU-accelerator co-processing yields better performance than the CPU-only execution or accelerator-only execution.…”
Section: Introductionmentioning
confidence: 99%
“…The CPU‐GPU cooperative computing has recently attracted the attention of many researchers and application developers. Some applications have been reported to successfully implement the CPU‐GPU cooperative computing, instead of the CPU‐only or GPU‐only computing, such as matrix multiplication , fast Fourier transformation , LU factorization , QR factorization , unsymmetric sparse linear system , radiation physics , molecular dynamics , conjugate gradient method , divide‐and‐conquer algorithm , and branch‐and‐bound algorithm . These works show that the CPU‐GPU cooperative computing has much better performance than the CPU‐only or GPU‐only computing.…”
Section: Introductionmentioning
confidence: 99%
“…Automatic performance tuning of matrix libraries has been studied from various aspects. There are approaches based on exhaustive search, 15,16 incremental parameter sampling, 17 statistical models 18,19 and machine learning, 20,21 to mention a few. Among them, the approach of ATMathCoreLib is unique in that it is targeted at the finite horizon problem; it is designed to finish auto-tuning in a specified number of executions and minimize the total execution time.…”
mentioning
confidence: 99%