Accelerating Numerical Dense Linear Algebra Calculations with GPUs

Dongarra, Jack; Gates, Mark; Haidar, Azzam; Kurzak, Jakub; Łuszczek, Piotr; Tomov, Stanimire; Yamazaki, Ichitaro

doi:10.1007/978-3-319-06548-9_1

Cited by 82 publications

(67 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…We believe that this choice guarantees a good load balancing between CPU and GPU, and it is a priority to increase the number of GPUs within the system to improve the performance. Furthermore, regarding the actual performance achievable with a CPU, it is important to note that only fractions of peak performance not exceeding 50 % are reasonably achievable on real problems (see, eg, the work of Dongarra), while for regular computation on a GPU it is possible to achieve actual performances very close to the peak performance (see, eg, the work of Dongarra et al). All these issues make reasonable to assume that the actual performance of the NVIDIA Tesla K40 GPU can be up to ten times the actual performance of 16 cores of the XEON E5‐2680v2 CPU.…”

Section: Test Resultsmentioning

confidence: 99%

An adaptive algorithm for high‐dimensional integrals on heterogeneous CPU‐GPU systems

Laccetti

Lapegna

Mele

et al. 2018

Concurrency and Computation

View full text Add to dashboard Cite

Summary In this paper, we introduce an adaptive procedure for the numerical computation of a high‐dimensional integrals on HPC systems with heterogeneous nodes composed of multi‐core CPU and GPU devices. To this aim, we have integrated together two different approaches: a first one is in charge of a fair workload among the threads running on the multi‐core CPU, while a second one is in charge of an efficient execution of the computational kernels on the GPU. We tested the resulting algorithm on several test functions on a system where the nodes are provided with two Intel ten‐core CPU and one NVIDIA GPU device.

show abstract

Section: Test Resultsmentioning

confidence: 99%

An adaptive algorithm for high‐dimensional integrals on heterogeneous CPU‐GPU systems

Laccetti

Lapegna

Mele

et al. 2018

Concurrency and Computation

View full text Add to dashboard Cite

show abstract

“…However, this often comes with the cost of complicated installations and extensive application refactoring. MAGMA [5] provides powerful intelligently scheduled BLAS and LAPACK algorithms, but due to the dependency on external libraries is difficult to install, configure, and tune, and does not yet provide unified or consistent capability across its CUDA, OpenCL, and Intel MIC implementations. Although MAGMA supports multi-GPU BLAS kernels, there is no built-in support for interoperability across different hardware accelerators, e.g.…”

Section: Resultsmentioning

confidence: 99%

“…We demonstrate that MetaMorph significantly reduces development time for heterogeneous systems without performance penalty and can be used to seamlessly utilize all the available hardware accelerators across multiple compute nodes, which include multicore CPUs, Intel MICs, AMD GPUs, and NVIDIA GPUs. In addition, we show MetaMorph's interoperability with hardware vendors' libraries and third-party libraries such as clBLAS [3], Intel MKL [4] and MAGMA libraries [5] (Section IV).…”

Section: Introductionmentioning

confidence: 99%

MetaMorph: A Library Framework for Interoperable Kernels on Multi- and Many-Core Clusters

Helal¹,

Tech²,

Sathre³

et al. 2016

SC16: International Conference for High Performance Computing, Networking, Storage and Analysis

View full text Add to dashboard Cite

Abstract-To attain scalable performance efficiently, the HPC community expects future exascale systems to consist of multiple nodes, each with different types of hardware accelerators. In addition to GPUs and Intel MICs, additional candidate accelerators include embedded multiprocessors and FPGAs. End users need appropriate tools to efficiently use the available compute resources in such systems, both within a compute node and across compute nodes. As such, we present MetaMorph, a library framework designed to (automatically) extract as much computational capability as possible from HPC systems. Its design centers around three core principles: abstraction, interoperability, and adaptivity. To demonstrate its efficacy, we present a case study that uses the structured grids design pattern, which is heavily used in computational fluid dynamics. We show how MetaMorph significantly reduces the development time, while delivering performance and interoperability across an array of heterogeneous devices, including multicore CPUs, Intel MICs, AMD GPUs, and NVIDIA GPUs.

show abstract

“…Performance tuning in such cases involves selecting a number of parameters that are highly system dependent, particularly for heterogeneous computers. 24 While this is a viable approach for supercomputing applications, it becomes impractical for individual workstations commonly used to process hyperspectral data from bench-top systems. For example, processing the same data sets on various workstations demonstrates unique profile curves for the same data set that are dependent on the batch size used to break up the input stream (Fig.…”

Section: Methodsmentioning

confidence: 99%

SIproc: an open-source biomedical data processing platform for large hyperspectral images

Berisha

Chang

Saki

et al. 2017

Analyst

View full text Add to dashboard Cite

There has recently been significant interest within the vibrational spectroscopy community to apply quantitative spectroscopic imaging techniques to histology and clinical diagnosis. However, many of the proposed methods require collecting spectroscopic images that have a similar region size and resolution to corresponding histological images. Since spectroscopic images contain significantly more spectral samples than traditional histology, the resulting data sets can approach hundreds of gigabytes to terabytes in size. This makes them difficult to store and process, and the tools available to researchers for handling large spectroscopic data sets are limited. Fundamental mathematical tools, such as MATLAB, Octave, and SciPy, are extremely powerful but require data to be stored in a fraction of the available system memory. These memory limitations become impractical for even modestly sizes histological images, which can be hundreds of gigabytes in size. In this paper, we propose an open-source toolkit designed to perform out-of-core processing of hyperspectral images. By taking advantage of graphical processing unit (GPU) computing combined with adaptive data streaming, our software alleviates common workstation memory limitations while achieving better performance than existing applications.

show abstract

Accelerating Numerical Dense Linear Algebra Calculations with GPUs

Cited by 82 publications

References 10 publications

An adaptive algorithm for high‐dimensional integrals on heterogeneous CPU‐GPU systems

An adaptive algorithm for high‐dimensional integrals on heterogeneous CPU‐GPU systems

MetaMorph: A Library Framework for Interoperable Kernels on Multi- and Many-Core Clusters

SIproc: an open-source biomedical data processing platform for large hyperspectral images

Contact Info

Product

Resources

About