Distributed Processing of a Regional Prediction Model

Johnson, Kenneth W.; Bauer, Jeff; Riccardi, G.; Droegemeier, Kelvin K.; Xue, Ming

doi:10.1175/1520-0493(1994)122<2558:dpoarp>2.0.co;2

Cited by 18 publications

(6 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The flow studied here is a neutral turbulent boundary layer, so that the potential temperature variations are therefore negligible. By avoiding the resolution of the Poisson equation for pressure, in compressible models such as ARPS (Xue et al 2000) or MM5 (Dudhia 1993), all computations are local to the grid points involved in the finite difference stencil, making their implementation on distributed-memory parallel processor computers straightforward through the use of domain decomposition strategies (Johnson et al 1994). Different from anelastic systems (Lafore et al 1998), the compressible system of equations does not have to make any approximation, making it suitable to a wider range of applications.…”

Section: Large-eddy Simulationmentioning

confidence: 99%

Large-eddy Simulation of the Dispersion of Solid Particles in a Turbulent Boundary Layer

Vinkovic

Aguirre

Ayrault

et al. 2006

Boundary-Layer Meteorol

View full text Add to dashboard Cite

A large-eddy simulation (LES) with the dynamic Smagorinsky-Germano subgrid-scale (SGS) model is used to study the dispersion of solid particles in a turbulent boundary layer. Solid particles are tracked in a Lagrangian way. The instantaneous velocity of the surrounding fluid is considered to have a large-scale part (directly computed by the LES) and a small-scale part. The SGS velocity of the surrounding fluid is given by a three-dimensional Langevin model written in terms of SGS statistics at a mesh level. An appropriate Lagrangian correlation time scale is considered in order to include the influences of gravity and inertia of the solid particle. Inter-particle collisions and the influence of particles on the mean flow are also taken into account. The results of the LES are compared with the wind-tunnel experiments of Nalpanis et al. (1993 J Fluid Mech 251: 661-685) and of Tanière et al. (1997 Exp in Fluids 23:463-471) on sand particles in saltation and in modified saltation, respectively.

show abstract

Section: Large-eddy Simulationmentioning

confidence: 99%

Large-eddy Simulation of the Dispersion of Solid Particles in a Turbulent Boundary Layer

Vinkovic

Aguirre

Ayrault

et al. 2006

Boundary-Layer Meteorol

View full text Add to dashboard Cite

show abstract

“…Any 1D decomposition is usually simple and trivial to implement, but it limits the maximum number of tasks to be the number in the specific direction. Two-dimensional decomposition can have a larger number of tasks limited by the product of the two given dimensions and is known to have less total data to transfer in terms of exchanging halo data (Johnson et al 1994). Two-dimensional decomposition was concluded to be more scalable than 1D decomposition in Skalin 1997a and b.…”

Section: Introductionmentioning

confidence: 99%

Parallelization of the NASA Goddard Cumulus Ensemble Model for Massively Parallel Computing

Juang¹,

Tao²,

Zeng³

et al. 2007

Terr. Atmos. Ocean. Sci.

View full text Add to dashboard Cite

Massively parallel computing, using a message passing interface (MPI), has been implemented into a three-dimensional version of the Goddard Cumulus Ensemble (GCE) model. The implementation uses the domainresemble concept to design a code structure for both the whole domain and sub-domains after decomposition. Instead of inserting a group of MPI related statements into the model routine, these statements are packed into a single routine. In other words, only a single call statement to the model code is utilized once in a place, thus there is minimal impact on the original code. Therefore, the model is easily modified and/or managed by the model developers and/or users, who have little knowledge of massively parallel computing.The model decomposition is highly flexible such that the entire model domain can be sliced into any number of partial domains in one-or twodimensional decomposition. Data exchange is through a halo-region, which is overlaid with neighboring partial domains. A halo-region is also known as a ghost-cell region. For reproducibility purposes, transposing data among tasks into different decompositions is required, such as Fourier transform and full domain summation.Terr. Atmos. Ocean. Sci., Vol. 18, No. 3, August 2007 594 The well-behaved performance of the implemented codes with anelastic and compressible versions on three different computing platforms indicates a successful implementation. The parallelization of both versions has speedup of 99% for up to 256 tasks. The anelastic version has better speedup and efficiency because its numerical algorithm is preferred by the parallelization than that of the compressible version.

show abstract

“…The parallel version of MM5 maps the three dimensional domain onto a two dimen sional array of processors so that the computations in a column of nodes are assigned to a single processor as shown in Figure 3.1. Johnson and others concluded that this decomposition gives the best efficiency [35].…”

Section: Data Mappingmentioning

confidence: 96%

“…Several other meteorological models have been parallelized for distributed mem ory parallel computers [3,21,32,35,49]. For efficiency, most parallel models use 2dimensional horizontal data decomposition to distribute computations into processors.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Parametric micro-level performance models for parallel computing and parallel implementation of hydrostatic MM5

Kim

View full text Add to dashboard Cite

This manuscript has been reproduced from the microfihn master. UMI films the text directly from the original or copy submitted. Thus, some thesis and dissertation copies are in typewriter &ce, while others may be from any type of computer printer.The qualiQr of this reproduction is dependent upon the quality of the copy submitted. Broken or indistinct print, colored or poor quality illustrations and photographs, print bleedthrough, substandard margins, and improper alignment can adversely affect reproduction.In the unlikely event that the author did not send UMI a complete manuscript and there are missing pages, these will be noted. Also, if unauthorized copyright material had to be removed, a note will indicate the deletion.Oversize materials (e.g., maps, drawings, charts) are reproduced by sectioning the original, beginning at the upper left-hand comer and continuing from left to right in equal sections with small overlaps. Each original is also photographed in one exposure and is included in reduced form at the back of the book.Step 2: Do experimental measurements of sample cases to determine model parameters and also the time for computation, communication, accessing the memory, and the miscellaneous time for auxiliary instructions.Step 3: Select the template for regression analysis to estimate the miscellaneous over head time. Determine the regression coefficients bcised on experimentally measured values. The regression formula for miscellaneous overhead time is denoted by fmisc-Step 4: Based on the experimental measurements, modify the analytical expressions fmem and fcomm SO that the predictions match with experimental timings deter mined in Step 2. The modifications to fmem are done to taJce into account cache effects and overlap of memory accesses with other operations. The modifications to fcomm axe done to take into account overlap of communication with computation.Step 5: Finally, the following formula is obtained to predict the execution time:fcomp "1" fcomm "h fmisc "t" fmem DetailsThe analytical formulas are given for the three parallel algorithms in .A.ppendix A.In analyzing practical scenarios for parallel machines, the lower order terms can be significant. These formulas are carefully derived by examining the parallel algorithm to capture all its essential details. The formulcis are complex, but the advantage is that the performance predictions are very accurate.The three algorithms used in the study are well-known. The LU decomposition is de scribed in [18]. The details of the FFT algorithm can be found in [13]. Cannon's parallel algorithm is described in [41]. The LU decomposition uses a 2-D scattered data layout for the coefficient matrix (see section 2.3.5.1), and it includes partial pivoting. Different communication patterns are used by the three algorithms. The matrix multiplication 9 uses neaxest-neighbor communication where elements are shifted from one processor to the next along either a row or a column with wrap-around at the end. In case of the LU decomposition, communication is need...

show abstract

Distributed Processing of a Regional Prediction Model

Cited by 18 publications

References 0 publications

Large-eddy Simulation of the Dispersion of Solid Particles in a Turbulent Boundary Layer

Large-eddy Simulation of the Dispersion of Solid Particles in a Turbulent Boundary Layer

Parallelization of the NASA Goddard Cumulus Ensemble Model for Massively Parallel Computing

Parametric micro-level performance models for parallel computing and parallel implementation of hydrostatic MM5

Contact Info

Product

Resources

About