3D parallel conjugate gradient solver optimized for GPUs

Carvalho, Rogerio Fernandes; Martins, Carlos Augusto Paiva da Silva; Batalha, Rose Mary de Souza; Camargos, Ana Flávia Peixoto de

doi:10.1109/cefc.2010.5481308

Cited by 8 publications

(5 citation statements)

References 2 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Despite most of the GPU implementations are based on sparse-matrix representation [33][34][35][36][37], the matrixfree methods are receiving an increasing attention for their inherent parallelism. These matrix-free methods have been used to compute the coefficients of the preconditioner for the resolution of large scale problems [38].…”

Section: Previous Workmentioning

confidence: 99%

Efficient matrix-free GPU implementation of Fixed Grid Finite Element Analysis

Martínez-Frutos

Herrero-Pérez

2015

Finite Elements in Analysis and Design

View full text Add to dashboard Cite

Section: Previous Workmentioning

confidence: 99%

Efficient matrix-free GPU implementation of Fixed Grid Finite Element Analysis

Martínez-Frutos

Herrero-Pérez

2015

Finite Elements in Analysis and Design

View full text Add to dashboard Cite

“…Several GPU-based solutions to the CG benchmark exist ( [5,4]). Most of this work is limited to a shared memory node, but some of the implementations can work for distributed memory systems using MPI between nodes [13].…”

Section: Conjugate Gradient Benchmarkmentioning

confidence: 99%

Ga-Gpu

Tipparaju

Vetter

2012

Proceedings of the 9th Conference on Computing Frontiers

View full text Add to dashboard Cite

Scalable heterogeneous computing (SHC) architectures are emerging as a response to new requirements for low cost, power efficiency, and high performance. For example, numerous contemporary HPC systems are using commodity Graphical Processing Units (GPU) to supplement traditional multicore processors. Yet scientists continue to face challenges in utilizing SHC systems. First and foremost, they are forced to combine a number of programming models and then delicately optimize the data movement among these multiple programming systems on each architecture. In this paper, we investigate a programming model for SHC systems that attempts to unify data access to the aggregate memory available in GPUs in the system. In particular, we extend the popular and easy to use Global Address Space (GAS) programming model to SHC systems. We explore multiple implementation options, and demonstrate our solution in the context of Global Arrays, a library based GAS model. We evaluated these options in the context of kernels and NWChem, a scalable chemistry application . Our results reveal that GA-GPU can offer considerable benefit to users in terms of programmability, and both our empirical results and performance model provide encouraging performance benefits for future systems that offer a tightly integrated memory system.

show abstract

“…Due to their efficient single instruction multiple data (SIMD) architectures, GPUs are gaining popularity for accelerating FEM (e.g., [15], [16], [17]). While much work has been done on GPU acceleration for various solvers (e.g., [11], [12], [13]), only limited research has been conducted on GPU acceleration for DA. Accelerating DA is a challenging task because DA execution involves a mixture of compute-intensive and memory-intensive workloads.…”

Section: Introductionmentioning

confidence: 99%

“…This system of equations is then solved by the CG solver. The CG method is an iterative method for solving linear system of equations, and has been well studied on GPUs [11].…”

Section: Introductionmentioning

confidence: 99%

GPU acceleration of Data Assembly in Finite Element Methods and its energy implications

Tang

Chen

et al. 2013

2013 IEEE 24th International Conference on Application-Specific Systems, Architectures and Processors

View full text Add to dashboard Cite

The Finite Element Method (FEM) is a numerical technique widely used in finding approximate solutions for many scientific and engineering problems. The Data Assembly (DA) stage in FEM can take up to 50% of the total FEM execution time. Accelerating DA with Graphics Processing Units (GPUs) presents challenges due to DA's mixed compute-intensive and memory-intensive workloads. This paper uses a representative finite element mini-application to explore DA acceleration on CPU+GPU platforms. Implementations based on different thread, kernel and task design approaches are developed and compared. Their performance and energy consumption are measured on four CPU+GPU and two CPU only platforms. The results show that (i) the performance and energy for different implementations on the same platform can vary significantly but the performance and energy trends are the same, and (ii) there exist performance and energy tradeoffs across some platforms if the best implementation is chosen for each of the platforms.

show abstract

3D parallel conjugate gradient solver optimized for GPUs

Cited by 8 publications

References 2 publications

Efficient matrix-free GPU implementation of Fixed Grid Finite Element Analysis

Efficient matrix-free GPU implementation of Fixed Grid Finite Element Analysis

Ga-Gpu

GPU acceleration of Data Assembly in Finite Element Methods and its energy implications

Contact Info

Product

Resources

About