Hierarchical algorithms on hierarchical architectures

Keyes, David E.; Ltaief, Hatem; Turkiyyah, George

doi:10.1098/rsta.2019.0055

Cited by 24 publications

(18 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…The original FMM method is kernel-dependent, but later on several kernel-independent methods have been proposed, kernel independent FMM, e.g., [70], hierarchical matrices, or H 2 matrices. For a discussion of hierarchical matrices see, e.g., [71], [72], [73], and [74].…”

Section: Exploiting Data Sparsitymentioning

confidence: 99%

Numerical algorithms for high-performance computational science

Dongarra

Grigori

Higham

2020

Phil. Trans. R. Soc. A.

View full text Add to dashboard Cite

A number of features of today’s high-performance computers make it challenging to exploit these machines fully for computational science. These include increasing core counts but stagnant clock frequencies; the high cost of data movement; use of accelerators (GPUs, FPGAs, coprocessors), making architectures increasingly heterogeneous; and multi- ple precisions of floating-point arithmetic, including half-precision. Moreover, as well as maximizing speed and accuracy, minimizing energy consumption is an important criterion. New generations of algorithms are needed to tackle these challenges. We discuss some approaches that we can take to develop numerical algorithms for high-performance computational science, with a view to exploiting the next generation of supercomputers. This article is part of a discussion meeting issue ‘Numerical algorithms for high-performance computational science’.

show abstract

Section: Exploiting Data Sparsitymentioning

confidence: 99%

Numerical algorithms for high-performance computational science

Dongarra

Grigori

Higham

2020

Phil. Trans. R. Soc. A.

View full text Add to dashboard Cite

show abstract

“…which produces the sum of a hierarchical matrix A of size N × N and a globally low rank matrix whose X and Y factors are of size N × k with k N . This operation can be efficiently implemented [30,54] by first adding the contributions of XY T to the various blocks of A at all levels, and recompressing the resulting sum algebraically as described earlier. The low rank update operation is a key routine for an operation that generates an explicit hierarchical matrix representation of an operator accessible only via matrix vector products.…”

Section: General Linear Algebra Operations On Hierarchical Matricesmentioning

confidence: 99%

“…Modern scientific workstations generally feature manycore GPU accelerators, and algorithms that do not effectively take advantage of these architectures are unlikely to be competitive for scientific and financial applications. Modern GPU architectures feature decreasing ratios of memory bandwidth to processing power, smaller amounts of fast memory per processing core, and substantial latencies for accessing data in deep memory [30]. Competitive algorithms must therefore be able to orchestrate their computations for effective execution in this environment.…”

Section: Introductionmentioning

confidence: 99%

Hierarchical matrix approximations for space-fractional diffusion equations

Boukaram

Lucchesi

Turkiyyah

et al. 2020

Computer Methods in Applied Mechanics and Engineering

View full text Add to dashboard Cite

Space fractional diffusion models generally lead to dense discrete matrix operators, which lead to substantial computational challenges when the system size becomes large. For a state of size N , full representation of a fractional diffusion matrix would require O(N 2) memory storage requirement, with a similar estimate for matrix-vector products. In this work, we present H 2 matrix representation and algorithms that are amenable to efficient implementation on GPUs, and that can reduce the cost of storing these operators to O(N) asymptotically. Matrix-vector multiplications can be performed in asymptotically linear time as well. Performance of the algorithms is assessed in light of 2D simulations of space fractional diffusion equation with constant diffusivity. Attention is focused on smooth particle approximation of the governing equations, which lead to discrete operators involving explicit radial kernels. The algorithms are first tested using the fundamental solution of the unforced space fractional diffusion equation in an unbounded domain, and then for the steady, forced, fractional diffusion equation in a bounded domain. Both matrix-inverse and pseudotransient solution approaches are considered in the latter case. Our experiments show that the construction of the fractional diffusion matrix, the matrix-vector multiplication, and the generation of an approximate inverse pre-conditioner all perform very well on a single GPU on 2D problems with N in the range 10 5-10 6. In addition, the tests also showed that, for the entire range of parameters and fractional orders considered, results obtained using the H 2 approximations were in close agreement with results obtained using dense operators, and exhibited the same spatial order of convergence. Overall, the present experiences showed that the H 2 matrix framework promises to provide practical means to handle large-scale space fractional diffusion models in several space dimensions, at a computational cost that is asymptotically similar to the cost of handling classical diffusion equations.

show abstract

“…• In our work we tackle multicore architectures. There exist recent research efforts to execute H-Matrices operations in distributed systems [74,116]. In fact, we have already developed a distributed memory implementation of the H-Chameleon library.…”

Section: Open Research Linesmentioning

confidence: 99%

“…• En nuestro trabajo abordamos ejecuciones paralelas en arquitecturas multinúcleo. Hay estudios recientes que tratan de ejecutar operaciones sobre H-Matrices en sistemas distribuidos [74,116]. De hecho, nosotros hemos desarrollado una implementación de la biblioteca H-Chameleon para memoria distribuida.…”

Section: Líneas De Investigación Abiertasunclassified

Analysis of Parallelization Strategies in the context of Hierarchical Matrix Factorizations

Carratalá-Sáez¹

View full text Add to dashboard Cite

show abstract

Hierarchical algorithms on hierarchical architectures

Cited by 24 publications

References 32 publications

Numerical algorithms for high-performance computational science

Numerical algorithms for high-performance computational science

Hierarchical matrix approximations for space-fractional diffusion equations

Analysis of Parallelization Strategies in the context of Hierarchical Matrix Factorizations

Contact Info

Product

Resources

About