Emmanuel Agullo scite author profile

Abstract. The emergence and continuing use of multi-core architectures and graphics processing units require changes in the existing software and sometimes even a redesign of the established algorithms in order to take advantage of now prevailing parallelism. Parallel Linear Algebra for Scalable Multi-core Architectures (PLASMA) and Matrix Algebra on GPU and Multics Architectures (MAGMA ) are two projects that aims to achieve high performance and portability across a wide range of multi-core architectures and hybrid systems respectively. We present in this document a comparative study of PLASMA's performance against established linear algebra packages and some preliminary results of MAGMA on hybrid multi-core and GPU systems.

show abstract

A Hybridization Methodology for High-Performance Linear Algebra Software for GPUs

Agullo¹,

Augonnet²,

Dongarra³

et al. 2012

View full text Add to dashboard Cite

Comparative study of one-sided factorizations with multiple software packages on multi-core hardware

Agullo

Hadri

Ltaief

et al. 2009

View full text Add to dashboard Cite

The emergence and continuing use of multi-core architectures require changes in the existing software and sometimes even a redesign of the established algorithms in order to take advantage of now prevailing parallelism. The Parallel Linear Algebra for Scalable Multi-core Architectures (PLASMA) is a project that aims to achieve both high performance and portability across a wide range of multi-core architectures. We present in this paper a comparative study of PLASMA's performance against established linear algebra packages (LAPACK and ScaLAPACK), against new approaches at parallel execution (Task Based Linear Algebra Subroutines -TBLAS), and against equivalent commercial software offerings (MKL, ESSL and PESSL). Our experiments were conducted on one-sided linear algebra factorizations (LU, QR and Cholesky) and used multi-core architectures (based on Intel Xeon EMT64 and IBM Power6). The performance results show improvements brought by new algorithms on up to 32 cores -the largest multi-core system we could access.

show abstract

Achieving High Performance on Supercomputers with a Sequential Task-based Programming Model

Agullo¹,

Aumage²,

Faverge³

et al. 2024

IEEE Trans. Parallel Distrib. Syst.

View full text Add to dashboard Cite

The emergence of accelerators as standard computing resources on supercomputers and the subsequent architectural complexity increase revived the need for high-level parallel programming paradigms. Sequential task-based programming model has been shown to efficiently meet this challenge on a single multicore node possibly enhanced with accelerators, which motivated its support in the OpenMP 4.0 standard. In this paper, we show that this paradigm can also be employed to achieve high performance on modern supercomputers composed of multiple such nodes, with extremely limited changes in the user code. To prove this claim, we have extended the StarPU runtime system with an advanced inter-node data management layer that supports this model by posting communications automatically. We illustrate our discussion with the task-based tile Cholesky algorithm that we implemented on top of this new runtime system layer. We show that it enables very high productivity while achieving a performance competitive with both the pure Message Passing Interface (MPI)-based ScaLAPACK Cholesky reference implementation and the DPLASMA Cholesky code, which implements another (non-sequential) task-based programming paradigm.

show abstract

Task-Based FMM for Multicore Architectures

Agullo¹,

Bramas²,

Coulaud³

et al. 2014

SIAM J. Sci. Comput.

View full text Add to dashboard Cite

International audienceFast Multipole Methods (FMM) are a fundamental operation for the simulation of many physical problems. The high performance design of such methods usually requires to carefully tune the algorithm for both the targeted physics and hardware. In this paper, we propose a new approach that achieves high performance across architectures. Our method consists of expressing the FMM algorithm as a task flow and employing a state- of-the-art runtime system, StarPU, to process the tasks on the different computing units. We carefully design the task flow, the mathematical operators, their implementations as well as scheduling schemes. Potentials and forces on 200 million particles are computed in 42.3 seconds on a homogeneous 160 cores SGI Altix UV 100 and good scalability is shown

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Emmanuel Agullo

Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects

A Hybridization Methodology for High-Performance Linear Algebra Software for GPUs

Comparative study of one-sided factorizations with multiple software packages on multi-core hardware

Achieving High Performance on Supercomputers with a Sequential Task-based Programming Model

Task-Based FMM for Multicore Architectures

Contact Info

Product

Resources

About