Moving Scientific Codes to Multicore Microprocessor CPUs

Woodward, Paul R.; Jayaraj, Jagan; Lin, Pei-Hung; Yew, Pen-Chung

doi:10.1109/mcse.2008.152

Cited by 18 publications

(10 citation statements)

References 3 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It also shares with our full application a very important feature: it has a large difference stencil, and the computation proceeds in phases that alternate between computations of cell and cell interface quantities. Each of these alternating phases of computation becomes transformed in our high performance code expression into a separate stage in a computation pipeline (see [2][3][4][5]). In this way, this PPM advection kernel reflects the overall character of our full application.…”

Section: Advection Examplementioning

confidence: 99%

A Study of the Performance of Multifluid PPM Gas Dynamics on CPUs and GPUs

Lin

Jayaraj

Woodward³

2011

2011 Symposium on Application Accelerators in High-Performance Computing

Self Cite

View full text Add to dashboard Cite

The potential for GPUs and many-core CPUs to support high performance computation in the area of computational fluid dynamics (CFD) is explored quantitatively through the example of the PPM gas dynamics code with PPB multifluid volume fraction advection. This code has already been implemented on the IBM Cell processor and run at full scale on the Los Alamos Roadrunner machine. This implementation has involved a complete restructuring of the code that has been described in detail elsewhere. Here the lessons learned from that work are exploited to take advantage of today's latest generations of multi-core CPUs and many-core GPUs. The operations performed by this code are characterized in detail after being first decomposed into a series of individual code kernels to allow an implementation on GPUs. Careful implementations of this code for both CPUs and GPUs are then contrasted from a performance point of view. In addition, a single kernel that has many of the characteristics of the full application on CPUs has been built into a full, standalone, scalable parallel application. This single-kernel application shows the GPU at its best. In contrast, the full multifluid gas dynamics application brings into play computational requirements that highlight the essential differences in CPU and GPU designs today and the different programming strategies needed to achieve the best performance for applications of this type on the two devices. The single kernel application code performs extremely well on both platforms. This application is not limited by main memory bandwidth on either device; instead it is limited only by the computational capability of each. In this case, the GPU has the advantage, because it has more computational cores. The full multifluid gas dynamics code is, however, of necessity memory bandwidth limited on the GPU, while it is still computational capability limited on the CPU. We believe that these codes provide a useful context for quantifying the costs and benefits of design decisions for these powerful new computing devices. Suggestions for improvements in both devices and codes based upon this work are offered in our conclusions.GPGPU, multicore CPU, high-performance computing, exascale computing, computational fluid dynamics, parallel programming, source-to-source transformation.

show abstract

Section: Advection Examplementioning

confidence: 99%

A Study of the Performance of Multifluid PPM Gas Dynamics on CPUs and GPUs

Lin

Jayaraj

Woodward³

2011

2011 Symposium on Application Accelerators in High-Performance Computing

Self Cite

View full text Add to dashboard Cite

show abstract

“…The indexing of these vector temporaries is baroque (cf. [4][5][6]). The programming effort required to produce such a code, modify it, debug it, and maintain it is excessive.…”

Section: The Solution: Extreme Pipelining Of the Computationmentioning

confidence: 99%

“…The body of this outer loop will consist of a series of vector loops with tests and jumps to the end of the outer loop in between some of these inner loops (cf. [5,6]). This is basically the same program transformation that we described in [3] at a lower level of dimensionality and that, with that lower dimensionality, we used in alternative expressions of our sPPM benchmark code in the late 1990s.…”

Section: Reducing the Programming Burdenmentioning

confidence: 99%

Boosting the performance of computational fluid dynamics codes for interactive supercomputing

Woodward

Jayaraj

Lin

et al. 2010

Procedia Computer Science

Self Cite

View full text Add to dashboard Cite

An extreme form of pipelining of the Piecewise-Parabolic Method (PPM) gas dynamics code has been used to dramatically increase its performance on the new generation of multicore CPUs. Exploiting this technique, together with a full integration of the several data post-processing and visualization utilities associated with this code has enabled numerical experiments in computational fluid dynamics to be performed interactively on a new, dedicated system in our lab, with immediate, user controlled visualization of the resulting flows on the PowerWall display. The code restructuring required to achieve the necessary CPU performance boost, as well as the parallel computing methods and systems used to enable interactive flow simulation are described. Requirements for these techniques to be applied to other codes are discussed, and our plans for tools that will assist programmers to exploit these techniques are briefly described. Examples showing the capability of the new system and software are given for applications in turbulence and stellar convection.

show abstract

“…Like other heterogeneous systems supporting multiple instruction set architectures, the Cell is not easy to program [27], requiring two separate source codes: one for the PPE, and the second for the SPEs. However, it is relatively easier to program than GPUs employing new programming languages, such as OpenGL and CUDA.…”

Section: Sti Cell Bementioning

confidence: 99%

Image processing applications performance study on Cell BE and Blue Gene/L

El-Moursy

Sibai

2010

Concurrency and Computation

View full text Add to dashboard Cite

SUMMARYTwo image processing applications, edge detection and image resizing, are studied in this paper on two HPC platforms namely the Cell BE and the Blue Gene/L machines. In this paper we focus on the performance scalability of the studied applications. Our results show that the scale of the problem to be solved highly affects the fitness of the platform. If the data set size is to fit into the Cell core, the fast on-chip inter-core communication of a multi-core system pays back for its high technology design. On the other hand, the overhead of the distant communication in the massively parallel Blue Gene/L machine will only show its benefits for huge data set size that otherwise mandates multiple round-trip data communications between the local memory of a core and main memory.

show abstract

Moving Scientific Codes to Multicore Microprocessor CPUs

Cited by 18 publications

References 3 publications

A Study of the Performance of Multifluid PPM Gas Dynamics on CPUs and GPUs

A Study of the Performance of Multifluid PPM Gas Dynamics on CPUs and GPUs

Boosting the performance of computational fluid dynamics codes for interactive supercomputing

Image processing applications performance study on Cell BE and Blue Gene/L

Contact Info

Product

Resources

About