Numerical Computations With GPUs 2014
DOI: 10.1007/978-3-319-06548-9_1
|View full text |Cite
|
Sign up to set email alerts
|

Accelerating Numerical Dense Linear Algebra Calculations with GPUs

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
66
0
1

Year Published

2016
2016
2023
2023

Publication Types

Select...
6
2

Relationship

1
7

Authors

Journals

citations
Cited by 82 publications
(67 citation statements)
references
References 10 publications
0
66
0
1
Order By: Relevance
“…We believe that this choice guarantees a good load balancing between CPU and GPU, and it is a priority to increase the number of GPUs within the system to improve the performance. Furthermore, regarding the actual performance achievable with a CPU, it is important to note that only fractions of peak performance not exceeding 50 % are reasonably achievable on real problems (see, eg, the work of Dongarra), while for regular computation on a GPU it is possible to achieve actual performances very close to the peak performance (see, eg, the work of Dongarra et al). All these issues make reasonable to assume that the actual performance of the NVIDIA Tesla K40 GPU can be up to ten times the actual performance of 16 cores of the XEON E5‐2680v2 CPU.…”
Section: Test Resultsmentioning
confidence: 99%
“…We believe that this choice guarantees a good load balancing between CPU and GPU, and it is a priority to increase the number of GPUs within the system to improve the performance. Furthermore, regarding the actual performance achievable with a CPU, it is important to note that only fractions of peak performance not exceeding 50 % are reasonably achievable on real problems (see, eg, the work of Dongarra), while for regular computation on a GPU it is possible to achieve actual performances very close to the peak performance (see, eg, the work of Dongarra et al). All these issues make reasonable to assume that the actual performance of the NVIDIA Tesla K40 GPU can be up to ten times the actual performance of 16 cores of the XEON E5‐2680v2 CPU.…”
Section: Test Resultsmentioning
confidence: 99%
“…However, this often comes with the cost of complicated installations and extensive application refactoring. MAGMA [5] provides powerful intelligently scheduled BLAS and LAPACK algorithms, but due to the dependency on external libraries is difficult to install, configure, and tune, and does not yet provide unified or consistent capability across its CUDA, OpenCL, and Intel MIC implementations. Although MAGMA supports multi-GPU BLAS kernels, there is no built-in support for interoperability across different hardware accelerators, e.g.…”
Section: Resultsmentioning
confidence: 99%
“…We demonstrate that MetaMorph significantly reduces development time for heterogeneous systems without performance penalty and can be used to seamlessly utilize all the available hardware accelerators across multiple compute nodes, which include multicore CPUs, Intel MICs, AMD GPUs, and NVIDIA GPUs. In addition, we show MetaMorph's interoperability with hardware vendors' libraries and third-party libraries such as clBLAS [3], Intel MKL [4] and MAGMA libraries [5] (Section IV).…”
Section: Introductionmentioning
confidence: 99%
“…Performance tuning in such cases involves selecting a number of parameters that are highly system dependent, particularly for heterogeneous computers. 24 While this is a viable approach for supercomputing applications, it becomes impractical for individual workstations commonly used to process hyperspectral data from bench-top systems. For example, processing the same data sets on various workstations demonstrates unique profile curves for the same data set that are dependent on the batch size used to break up the input stream (Fig.…”
Section: Methodsmentioning
confidence: 99%