Pieter Ghysels scite author profile

In the generalized minimal residual method (GMRES), the global all-to-all communication required in each iteration for orthogonalization and normalization of the Krylov base vectors is becoming a performance bottleneck on massively parallel machines. Long latencies, system noise, and load imbalance cause these global reductions to become very costly global synchronizations. In this work, we propose the use of nonblocking or asynchronous global reductions to hide these global communication latencies by overlapping them with other communications and calculations. A pipelined variation of GMRES is presented in which the result of a global reduction is used only one or more iterations after the communication phase has started. This way, global synchronization is relaxed and scalability is much improved at the expense of some extra computations. The numerical instabilities that inevitably arise due to the typical monomial basis by powering the matrix are reduced and often annihilated by using Newton or Chebyshev bases instead. Our parallel experiments on a medium-sized cluster show significant speedups of the pipelined solvers compared to standard GMRES. An analytical model is used to extrapolate the performance to future exascale systems.

show abstract

A Distributed-Memory Package for Dense Hierarchically Semi-Separable Matrix Computations Using Randomization

Rouet

Ghysels

et al. 2016

ACM Trans. Math. Softw.

117

View full text Add to dashboard Cite

We present a distributed-memory library for computations with dense structured matrices. A matrix is considered structured if its off-diagonal blocks can be approximated by a rank-deficient matrix with low numerical rank. Here, we use Hierarchically Semi-Separable representations (HSS). Such matrices appear in many applications, e.g., finite element methods, boundary element methods, etc. Exploiting this structure allows for fast solution of linear systems and/or fast computation of matrix-vector products, which are the two main building blocks of matrix computations. The compression algorithm that we use, that computes the HSS form of an input dense matrix, relies on randomized sampling with a novel adaptive sampling mechanism. We discuss the parallelization of this algorithm and also present the parallelization of structured matrix-vector product, structured factorization and solution routines. The efficiency of the approach is demonstrated on large problems from different academic and industrial applications, on up to 8,000 cores.This work is part of a more global effort, the STRUMPACK (STRUctured Matrices PACKage) software package for computations with sparse and dense structured matrices. Hence, although useful on their own right, the routines also represent a step in the direction of a distributed-memory sparse solver.

show abstract

A particle-based model to simulate the micromechanics of single-plant parenchyma cells and aggregates

et al. 2010

View full text Add to dashboard Cite

This paper is concerned with addressing how plant tissue mechanics is related to the micromechanics of cells. To this end, we propose a mesh-free particle method to simulate the mechanics of both individual plant cells (parenchyma) and cell aggregates in response to external stresses. The model considers two important features in the plant cell: (1) the cell protoplasm, the interior liquid phase inducing hydrodynamic phenomena, and (2) the cell wall material, a viscoelastic solid material that contains the protoplasm. In this particle framework, the cell fluid is modeled by smoothed particle hydrodynamics (SPH), a mesh-free method typically used to address problems with gas and fluid dynamics. In the solid phase (cell wall) on the other hand, the particles are connected by pairwise interactions holding them together and preventing the fluid to penetrate the cell wall. The cell wall hydraulic conductivity (permeability) is built in as well through the SPH formulation. Although this model is also meant to be able to deal with dynamic and even violent situations (leading to cell wall rupture or cell-cell debonding), we have concentrated on quasi-static conditions. The results of single-cell compression simulations show that the conclusions found by analytical models and experiments can be reproduced at least qualitatively. Relaxation tests revealed that plant cells have short relaxation times (1 micros-10 micros) compared to mammalian cells. Simulations performed on cell aggregates indicated an influence of the cellular organization to the tissue response, as was also observed in experiments done on tissues with a similar structure.

show abstract

An Efficient Multicore Implementation of a Novel HSS-Structured Multifrontal Solver Using Randomized Sampling

Ghysels¹,

Li²,

Rouet³

et al. 2016

SIAM J. Sci. Comput.

125

View full text Add to dashboard Cite

We present a sparse linear system solver that is based on a multifrontal variant of Gaussian elimination, and exploits low-rank approximation of the resulting dense frontal matrices. We use hierarchically semiseparable (HSS) matrices, which have low-rank off-diagonal blocks, to approximate the frontal matrices. For HSS matrix construction, a randomized sampling algorithm is used together with interpolative decompositions. The combination of the randomized compression with a fast ULV HSS factorization leads to a solver with lower computational complexity than the standard multifrontal method for many applications, resulting in speedups up to 7 fold for problems in our test suite. The implementation targets many-core systems by using task parallelism with dynamic runtime scheduling. Numerical experiments show performance improvements over state-of-the-art sparse direct solvers. The implementation achieves high performance and good scalability on a range of modern shared memory parallel systems, including the Intel R Xeon Phi (MIC). The code is part of a software package called STRUMPACK -STRUctured Matrices PACKage, which also has a distributed memory component for dense rank-structured matrices.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Pieter Ghysels

Hiding global synchronization latency in the preconditioned Conjugate Gradient algorithm

Hiding Global Communication Latency in the GMRES Algorithm on Massively Parallel Machines

A Distributed-Memory Package for Dense Hierarchically Semi-Separable Matrix Computations Using Randomization

A particle-based model to simulate the micromechanics of single-plant parenchyma cells and aggregates

An Efficient Multicore Implementation of a Novel HSS-Structured Multifrontal Solver Using Randomized Sampling

Contact Info

Product

Resources

About