2001
DOI: 10.1155/2001/891073
|View full text |Cite
|
Sign up to set email alerts
|

Workload Decomposition Strategies for Shared Memory Parallel Systems with OpenMP

Abstract: A crucial issue in parallel programming (both for distributed and shared memory architectures) is work decomposition. Work decomposition task can be accomplished without large programming effort with use of high-level parallel programming languages, such as OpenMP. Anyway particular care must still be payed on achieving performance goals. In this paper we introduce and compare two decomposition strategies, in the framework of shared memory systems, as applied to a case study particle in cell application. A num… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
6
0

Year Published

2001
2001
2005
2005

Publication Types

Select...
2
2
1

Relationship

5
0

Authors

Journals

citations
Cited by 5 publications
(6 citation statements)
references
References 18 publications
0
6
0
Order By: Relevance
“…Race conditions can still occur, however, in the labelling phase, in which each particle is assigned, within a parallel loop over particles, to its interval and labelled by the incremented value of a counter: different threads can try to update the counter of a certain interval at the same time. The negative impact of such race conditions on the parallelization efficiency can be contained by avoiding to execute a complete labelling procedure for all the particles at each time step, while updating such indexing "by intervals" only in correspondence to particles that have changed interval in the last time step [4]. The integration of the inter-node domain-decomposition strategy with the intra-node particle-decomposition one does not present any relevant problem.…”
Section: Mpi Implementation Of the Inter-node Domain Decompositionmentioning
confidence: 99%
See 1 more Smart Citation
“…Race conditions can still occur, however, in the labelling phase, in which each particle is assigned, within a parallel loop over particles, to its interval and labelled by the incremented value of a counter: different threads can try to update the counter of a certain interval at the same time. The negative impact of such race conditions on the parallelization efficiency can be contained by avoiding to execute a complete labelling procedure for all the particles at each time step, while updating such indexing "by intervals" only in correspondence to particles that have changed interval in the last time step [4]. The integration of the inter-node domain-decomposition strategy with the intra-node particle-decomposition one does not present any relevant problem.…”
Section: Mpi Implementation Of the Inter-node Domain Decompositionmentioning
confidence: 99%
“…Such workload decomposition strategies can be applied both for distributedmemory parallel systems [6,5] and shared-memory ones [4]. They can also be combined, when porting a PIC code on a hierarchical distributed-shared memory system (e.g., a cluster of SMPs), in two-level strategies: a distributed-memory level decomposition (among the n node computational nodes), and a sharedmemory one (among the n proc processors of each node).…”
Section: Introductionmentioning
confidence: 99%
“…Conversely, the domain decomposition does not require a memory waste, while presenting particle migration between different portions of the domain, which causes communication overheads and the need for dynamic load balancing [4,7]. Such workload decomposition strategies can be applied both for distributedmemory parallel systems [7,6] and shared-memory ones [5]. They can also be combined, when porting a PIC code on a hierarchical distributed-shared memory system (e.g., a cluster of SMPs), in two-level strategies: a distributed-memory level decomposition (among the n node computational nodes), and a shared-memory one (among the n proc processors of each node).…”
Section: Mpi Implementation Of the Inter-node Domain Decompositionmentioning
confidence: 99%
“…The relevant portion of the pressure updating extrinsic procedure described in Section 5 then becomes p_par = 0. !$OMP parallel do private(l,j_r,j_theta,j_phi) do l = 1,UBOUND(r,dim = 1) j_r = f_r(r(l)) j_theta = f_theta(theta(l)) j_phi = f_phi(phi(l)) !$OMP critical p_par(j_r,j_theta,j_phi,1) = & p_par(j_r,j_theta,j_phi,1) & + h(r(l),...,w(l)) !$OMP end critical enddo !$OMP end parallel do Unfortunately, the intra-node serialization induced by the protected critical section on the shared access to the array p par represents a bottleneck that heavily affects the performances (almost no speedup) [14]. Such a bottleneck can be eliminated, at the expense of memory occupation, by means of an alternative strategy, analogous to that envisaged within the framework of the inter-node decomposition, which relies on the associative and distributive properties of the updating laws for the pressure array with respect to the contributions given by every single particle: the computation for each update is split among the threads into partial computations, each of them involving only the contribution of the particles managed by the responsible thread; then the partial results are reduced into global ones.…”
Section: Particle Decomposition Strategymentioning
confidence: 99%