Optimal dynamic remapping of data parallel computations

Nicol, David M.; Reynolds, Paul F.

doi:10.1109/12.45206

Cited by 48 publications

(32 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…An equivalent form of (19) can be written as V nþ1 ¼ min u ðP P ðuÞV n þ CðuÞÞ: Let u be the optimal control satisfying (18) and u ðnÞ be the nth iterate of the control corresponding to (19). Then, by the optimality,…”

Section: Policy Iteration Algorithm For Optimal Remappingmentioning

confidence: 99%

“…In a typical iteration of this algorithm, given a control policy u and the corresponding cost CðuÞ, one obtains an improved policy satisfying (19). The new policy is strictly better if the current policy is nonoptimal.…”

Section: Policy Iteration Algorithm For Optimal Remappingmentioning

confidence: 99%

“…We note that each iteration involves a "policy evaluation" step whereby, given a control policy u, we obtain the corresponding cost vector JðuÞ by solving the system of (19). This step can be time-consuming when the number of states is large.…”

Section: Policy Iteration Algorithm For Optimal Remappingmentioning

confidence: 99%

“…Given the bulk synchronous computation settings in this example, we experimented with the policy iteration algorithm (19), assuming different remapping costs. Fig.…”

Section: An Examplementioning

confidence: 99%

“…Moon and Saltz [16] compared the periodic policy with the SAR strategy in their Monte Carlo simulations and found that the best fixed interval policy was able to deliver comparable performance to the SAR through the periodic policy. There are also applicationspecific remapping policies that invocate remapping in response to abrupt changes of the computations (e.g., grid refinement in the CFD code) or changes of the system resources [5], [4], [19].…”

mentioning

confidence: 99%

See 4 more Smart Citations

Optimal remapping in dynamic bulk synchronous computations via a stochastic control approach

Yin

Wang

2002

Proceedings 16th International Parallel and Distributed Processing Symposium

View full text Add to dashboard Cite

Abstract-A bulk synchronous computation proceeds in phases that are separated by barrier synchronization. For dynamic bulk synchronous computations that exhibit varying phase-wise computational requirements, remapping at runtime is an effective approach to ensure parallel efficiency. This paper introduces a novel remapping strategy for computations whose workload changes can be modeled as a Markov chain. The use of Markovian model allows us to treat statistical dependence and more complex structure than the usual independent identically distributed random variable assumptions. Our models are quite general and we do not need to impose conditions on the dynamics of the underlying process other than the transition probability matrix. It is shown that optimal remapping can be formulated as a binary decision process: remap or not at a given synchronizing instant. The optimal strategy is then developed for long lasted computations by employing optimal stopping rules in a stochastic control framework. The existence of optimal controls is established. Necessary and sufficient conditions for the optimality are obtained. Furthermore, a policy iteration algorithm is devised to reduce computational complexity and enhance fast convergence to the desired optimal control.

show abstract

Section: Policy Iteration Algorithm For Optimal Remappingmentioning

confidence: 99%

Section: Policy Iteration Algorithm For Optimal Remappingmentioning

confidence: 99%

Section: Policy Iteration Algorithm For Optimal Remappingmentioning

confidence: 99%

“…Given the bulk synchronous computation settings in this example, we experimented with the policy iteration algorithm (19), assuming different remapping costs. Fig.…”

Section: An Examplementioning

confidence: 99%

mentioning

confidence: 99%

See 3 more Smart Citations

Optimal remapping in dynamic bulk synchronous computations via a stochastic control approach

Yin

Wang

2002

Proceedings 16th International Parallel and Distributed Processing Symposium

View full text Add to dashboard Cite

show abstract

A framework for partitioning parallel computations in heterogeneous environments

Weissman

Grimshaw

1995

Concurrency: Pract. Exper.

View full text Add to dashboard Cite

In the paper we present a framework for partitioning data parallel computations across a heterogeneous metasystem at runtime. The framework is guided by program and resource information which is made available to the system. Three difficult problems are handled by the framework: processor selection, task placement and heterogeneous data domain decomposition. Solving each of these problems contributes to reduced elapsed time. In particular, processor selection determines the best gain size at which to run the computation, task placement reduces communication cost, and data domain decomposition achieves processor load balance. We present results which indicate that excellent performanceis achievableusing the framework. The paper extends our earlier work on partitioning data parallel computations across a singlelevel network of heterogeneous workstations. INTRODUCTIONA great deal of recent interest has been sparked within academic, industrial and government circles in the emerging technology of metmystem-based high-performance computing. A metasystern is a shared ensemble of workstations, vector and parallel machines connected by local-and wide-area networks (see Figure I). The promise of on-line gigabit networks coupled with the tremendous computing power of the metasystem makes it very attractive for parallel computations.The potentially large array of heterogeneous resources in the metasystem offers an opportunity for delivering high performance on a range of parallel computations. Choosing the best set of available resources is a difficult problem and is the subject of this paper. Consider the set of machines in Table 1 and observe that they have different computation and communication capacities. Loosely coupled parallel computations with infrequent communication would probably benefit by applying the fastest set of computational resources (perhaps the DEC-Alphacluster), and may benefit from distributionacross many machines. On the other hand, more tightly coupled parallel computations are best suited to machines that have a higher communication capacity (perhaps an Intel Paragon), but may also benefit from distribution across many machines if the computation granularity is sufficient. We address the latter problem in this paper.We present a framework that automates partitioning and placement of data parallel computations across metasystems such as in Figure 1. Partitioning is performed at runtime when the state of the metasystem resources are known. Three difficult problems are handled by the framework: processor selection, task placement and heterogeneous data domain decomposition. Solving each of these problems contributes to reduced completion time. Processor selection chooses the best number and type of processors to apply to the computation. This el = k = T = the i " ' network cluster the irh processor cluster application communication topology message size in bytes communication cost coefficients processor-dependent communication function router cost constants coercion cost constant number of messages th...

show abstract

Three‐dimensional parallel unstructured grid generation

Shostko

Löhner

1995

Numerical Meth Engineering

View full text Add to dashboard Cite

SUMMARYAn algorithm for the parallel generation of 3-D unstructured grids is presented. The technique is an extension of the algorithm presented in Reference 21 for the 2-D case. The method uses a background grid as the means to separate spatially different regions, enabling the concurrent, parallel generation of elements in different domains and interdomain regions. The parallel 3-D grid generator was implemented and tested on the INTEL hypercube and Touchstone Delta parallel computers. The results obtained demonstrate the effectiveness of the algorithm developed. The methodology is applicable to the parallel implementation of a wide range of problems that are, in principle, scalar by nature, and do not lend themselves to SIMD parallelization.

show abstract

Optimal dynamic remapping of data parallel computations

Cited by 48 publications

References 23 publications

Optimal remapping in dynamic bulk synchronous computations via a stochastic control approach

Optimal remapping in dynamic bulk synchronous computations via a stochastic control approach

A framework for partitioning parallel computations in heterogeneous environments

Three‐dimensional parallel unstructured grid generation

Contact Info

Product

Resources

About