High-performance parallel computing in industry

Eldredge, Michael; Hughes, Thomas J.R.; Ferencz, Robert M.; Rifai, Steven M.; Raefsky, A.; Herndon, Bruce

doi:10.1016/s0167-8191(97)00049-5

Cited by 13 publications

(2 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Parallel computing has been used to advantage in many areas of science and engineering and has increasingly been applied to industrial engineering problems. Overviews of industrial applications of parallel computing are given by Eldredge et al, (1997) and by Thole and Stuben (1999).…”

Section: Introductionmentioning

confidence: 99%

Simulation of a batch chemical process using parallel computing with PVM and Speedup

Smith

2004

Computers & Chemical Engineering

View full text Add to dashboard Cite

Speedup, a commercial software package for the dynamic modeling of chemical processes, has been coupled with the PVM software to allow a single process model to be distributed over several computers running in parallel. As an initial application, this coarse distribution technique was applied to a batch chemical plant containing 16 unit operations. Computation time for this problem was reduced by a factor of 4.7 using only three parallel processors in the UNIX computing environment. Better than linear acceleration was achieved from the significant reduction in computation required to reinitialize the smaller subprocesses at discontinuities in the solution. The process was physically divided at points that naturally separated the overall plant into distinct subprocesses. This facilitated the computation by minimizing the interconnection between the parallel units. Techniques were developed to make efficient material and energy transfers between the modeled subprocesses based on actual material transfers used in plant operations.

show abstract

Section: Introductionmentioning

confidence: 99%

Simulation of a batch chemical process using parallel computing with PVM and Speedup

Smith

2004

Computers & Chemical Engineering

View full text Add to dashboard Cite

show abstract

“…It is now well known that at the higher level, the coarse-grained sub-domains map well to the SPMD style-programming model [35]. Furthermore, the communication model of the MPP significantly reduces the parallel overhead associated with the message passing across the processes [30].…”

mentioning

confidence: 99%

An efficient coarse‐grained parallel algorithm for global–local multiscale computations on massively parallel systems

Rahul

2009

Numerical Meth Engineering

View full text Add to dashboard Cite

The existing global-local multiscale computational methods, using finite element discretization at both the macro-scale and micro-scale, are intensive both in terms of computational time and memory requirements and their parallelization using domain decomposition methods incur substantial communication overhead, limiting their application. We are interested in a class of explicit global-local multiscale methods whose architecture significantly reduces this communication overhead on massively parallel machines. However, a naïve task decomposition based on distributing individual macro-scale integration points to a single group of processors is not optimal and leads to communication overheads and idling of processors. To overcome this problem, we have developed a novel coarse-grained parallel algorithm in which groups of macro-scale integration points are distributed to a layer of processors. Each processor in this layer communicates locally with a group of processors that are responsible for the micro-scale computations. The overlapping groups of processors are shown to achieve optimal concurrency at significantly reduced communication overhead. Several example problems are presented to demonstrate the efficiency of the proposed algorithm.to ensure both the efficiency and the reliability of these methods. While there has been excellent progress in the development of multiscale methods, the issue of efficiency has not received sufficient attention.Scale linking is currently performed using hierarchical [2] and concurrent [2-9] schemes. Global-local type of multiscale methods [10-20] falls within the category of hierarchical multiscale methods where the stress-strain relationship at every integration point of the macro-scale is computed by suitably deforming an associated representative volume element (RVE). The major advantage of this class of methods is the ability to model arbitrary non-linearities at the micro-scale as no a priori constitutive assumption is made at the macro-scale. Whereas finite elements are used to discretize the spatial scale at the macro-scale, a variety of techniques have been used to model the RVE, including traditional finite elements [15,[18][19][20], the Voronoi cell finite element method (VCFEM) [13,14], a crystal plasticity framework [16] and numerical methods based on Fast Fourier Transforms [17,21]. However, a major disadvantage of these fully coupled computational techniques is that they are intensive in terms of processor time and memory requirements.Interesting attempts to improve the computational efficiency include re-formulation of the global problem [22], and incorporation of micro-scale effects directly into the finite element basis functions to capture their effect on the macro-scale [23]. In the latter approach, the construction of the basis functions is fully decoupled from one element to the other and hence it is naturally adapted to massively parallel computing. In [24], structural decomposition-based parallel computation strategy is used for the multiscale computation,...

show abstract

Performance analysis of a millimeter wave MIMO channel estimation method in an embedded multi-core processor

et al. 2022

View full text Add to dashboard Cite

The emerging Multi-Processor System-on-Chip (MPSoC) technology, which combines heterogeneous computing with the high performance of field programmable gate arrays (FPGA), is a promising platform for a large number of applications, including wireless communications and vehicular technology. In this specific application context, when multiple-input multiple-output (MIMO) scenarios are considered, the system usually has to manage a large number of communication links among sensors and antennas involving different vehicles and users. Millimeter wave (mmWave) communications are one of the key technology enablers toward achieving high data rates in beyond 5G systems (B5G). Communication at these frequency bands usually involves the use of large antenna arrays, often requiring high computational resources. One of the candidate platforms able to manage a huge number of communications is the Xilinx Zynq UltraScale+ EG Heterogeneous MPSoC, which is composed of a dual-core Cortex-R5, a quad-core ARM Cortex-A53, a graphics processing unit (GPU) and a high-end FPGA. This work analyzes the computational performance that requires a recent mmWave MIMO channel estimation algorithm in a platform of this kind. As a first approach, we will focus our work on the performance that can be achieved via the quad-core ARM Cortex-A53. To this end, we will use the libraries for numerical algebra (BLAS and LAPACK). The results show that our reference implementation is able to manage a large MIMO communication system with 256 antennas without exhausting platform resources.

show abstract

High-performance parallel computing in industry

Cited by 13 publications

References 5 publications

Simulation of a batch chemical process using parallel computing with PVM and Speedup

Simulation of a batch chemical process using parallel computing with PVM and Speedup

An efficient coarse‐grained parallel algorithm for global–local multiscale computations on massively parallel systems

Performance analysis of a millimeter wave MIMO channel estimation method in an embedded multi-core processor

Contact Info

Product

Resources

About