Boosting the performance of computational fluid dynamics codes for interactive supercomputing

Woodward, Paul R.; Jayaraj, Jagan; Lin, Pei-Hung; Yew, Pen-Chung; Knox, Michael; Greensky, James; Nowatski, Anthony; Stoffels, Karl

doi:10.1016/j.procs.2010.04.230

Cited by 3 publications

(6 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To fully realize these benefits, massive pipelining of numerical algorithms to achieve the highest possible amount of reuse of cached data is extremely important. The briquette data structure, massive code pipelining, and short, aligned vector operands all are features of the code restructuring and transformation that our team has been advocating as a result of our experience with the IBM Cell processor [2][3][4][5]. We have built and are refining automatic code translators that perform these code restructuring transformations as a precompilation step.…”

Section: Discussionmentioning

confidence: 99%

“…It also shares with our full application a very important feature: it has a large difference stencil, and the computation proceeds in phases that alternate between computations of cell and cell interface quantities. Each of these alternating phases of computation becomes transformed in our high performance code expression into a separate stage in a computation pipeline (see [2][3][4][5]). In this way, this PPM advection kernel reflects the overall character of our full application.…”

Section: Advection Examplementioning

confidence: 99%

“…It has been extensively transformed so that the entire algorithm is performed in a pipelined fashion exclusively using aligned quadword operands. These code transformations have been described elsewhere [2][3][4][5]. We are building automatic code translators (see [5]) to produce our pipelined Fortran code from a much simpler form that is more easily written, read, modified, and maintained.…”

Section: Advection Examplementioning

confidence: 99%

“…The Intel Westmere CPU is able to prefetch all the data needed to update a single grid briquette of 4 3 cells for each of 2 threads running on each of its 6 CPU cores during only 43% of the time it takes that thread to update the previous grid briquette in a sequence along the direction of the 1-D pass. The massive pipelining of our fluid dynamics algorithm that makes this possible has been described in detail elsewhere [2][3][4][5]. It results in an overall computational intensity of 33.5 flops per main device memory word read or written, which is sufficient to allow the CPU core to run without waiting on data transfers from or to its main device memory.…”

Section: Application Codementioning

confidence: 99%

See 3 more Smart Citations

A Study of the Performance of Multifluid PPM Gas Dynamics on CPUs and GPUs

Lin

Jayaraj

Woodward³

2011

2011 Symposium on Application Accelerators in High-Performance Computing

Self Cite

View full text Add to dashboard Cite

The potential for GPUs and many-core CPUs to support high performance computation in the area of computational fluid dynamics (CFD) is explored quantitatively through the example of the PPM gas dynamics code with PPB multifluid volume fraction advection. This code has already been implemented on the IBM Cell processor and run at full scale on the Los Alamos Roadrunner machine. This implementation has involved a complete restructuring of the code that has been described in detail elsewhere. Here the lessons learned from that work are exploited to take advantage of today's latest generations of multi-core CPUs and many-core GPUs. The operations performed by this code are characterized in detail after being first decomposed into a series of individual code kernels to allow an implementation on GPUs. Careful implementations of this code for both CPUs and GPUs are then contrasted from a performance point of view. In addition, a single kernel that has many of the characteristics of the full application on CPUs has been built into a full, standalone, scalable parallel application. This single-kernel application shows the GPU at its best. In contrast, the full multifluid gas dynamics application brings into play computational requirements that highlight the essential differences in CPU and GPU designs today and the different programming strategies needed to achieve the best performance for applications of this type on the two devices. The single kernel application code performs extremely well on both platforms. This application is not limited by main memory bandwidth on either device; instead it is limited only by the computational capability of each. In this case, the GPU has the advantage, because it has more computational cores. The full multifluid gas dynamics code is, however, of necessity memory bandwidth limited on the GPU, while it is still computational capability limited on the CPU. We believe that these codes provide a useful context for quantifying the costs and benefits of design decisions for these powerful new computing devices. Suggestions for improvements in both devices and codes based upon this work are offered in our conclusions.GPGPU, multicore CPU, high-performance computing, exascale computing, computational fluid dynamics, parallel programming, source-to-source transformation.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Advection Examplementioning

confidence: 99%

Section: Advection Examplementioning

confidence: 99%

Section: Application Codementioning

confidence: 99%

See 2 more Smart Citations

A Study of the Performance of Multifluid PPM Gas Dynamics on CPUs and GPUs

Lin

Jayaraj

Woodward³

2011

2011 Symposium on Application Accelerators in High-Performance Computing

Self Cite

View full text Add to dashboard Cite

show abstract

“…In the traditional blocking scheme, each loop runs over the entire domain before proceeding to the next loop. In an alternative scheme (Woodward et al, 2010) all of the loops are run on a block before moving to the next block, as illustrated in Figure 5. Each large rectangle represents the iteration space at different points of progress (indicated by shading), and each subrectangle represents a block of the iteration space that fits into local memory.…”

Section: </Machine>mentioning

confidence: 99%

ExaSAT: An exascale co-design tool for performance modeling

Unat

Chan

Zhang

et al. 2015

The International Journal of High Performance Computing Applica

View full text Add to dashboard Cite

One of the emerging challenges to design HPC systems is to understand and project the requirements of exascale applications. In order to determine the performance consequences of dierent hardware designs, analytic models are essential because they can provide fast feedback to the co-design centers and chip designers without costly simulations. However, current attempts to analytically model program performance typically rely on the user manually specifying a performance model. We introduce the ExaSAT framework that automates the extraction of parameterized performance models directly from source code using compiler analysis. The parameterized analytic model enables quantitative evaluation of a broad range of hardware design trade-os and software optimizations on a variety of dierent performance metrics, with a primary focus on data movement as a metric. We demonstrate the ExaSAT framework's ability to perform deep code analysis of a proxy application from the DOE Combustion Co-design Center to illustrate its value to the exascale co-design process. ExaSAT analysis provides insights in the hardware and software tradeos and lays the groundwork for exploring a more targeted set of design points using cycle-accurate architectural simulators.

show abstract

Computational Fluid Dynamic Using Parallel Loop of Multi-Cores Processor

Siow

Jaswar

Afrizal

2014

AMM

View full text Add to dashboard Cite

Computational Fluid Dynamics (CFD) software is often used to study fluid flow and structures motion in fluids. The CFD normally requires large size of arrays and computer memory and then caused long execution time. However, Innovation of computer hardware such as multi-cores processor provides an alternative solution to improve this programming performance. This paper discussed loop parallelize multi-cores processor for optimization of sequential looping CFD code. This loop parallelize CFD was achieved by applying multi-tasking or multi-threading code into the original CFD code which was developed by one of the authors. The CFD code was developed based on Reynolds Average Navier-Stokes (RANS) method. The new CFD code program was developed using Microsoft Visual Basic (VB) programming language. In the early stage, the whole CFD code was constructed in a sequential flow before it is modified to parallel flow by using VBs multi-threading library. In the comparison, fluid flow around the hull of round-shaped FPSO was selected to compare the performance of both the programming codes. Besides, executed results of this self-developed code such as pressure distribution around the hull were also presented in this paper.

show abstract

Boosting the performance of computational fluid dynamics codes for interactive supercomputing

Cited by 3 publications

References 8 publications

A Study of the Performance of Multifluid PPM Gas Dynamics on CPUs and GPUs

A Study of the Performance of Multifluid PPM Gas Dynamics on CPUs and GPUs

ExaSAT: An exascale co-design tool for performance modeling

Computational Fluid Dynamic Using Parallel Loop of Multi-Cores Processor

Contact Info

Product

Resources

About