Optimizing the steady-state throughput of scatter and reduce operations on heterogeneous platforms

Legrand, Arnaud; Marchal, Loris; Robert, Yves

doi:10.1109/ipdps.2004.1303181

Cited by 5 publications

(7 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Legrand, Marchal and Robert [11] study steady-state situations where a series of reductions are performed. As in our work, the reduction operation is assumed to be indivisible, transfers and computations can overlap and the full-duplex one-port model is considered.…”

Section: Related Workmentioning

confidence: 99%

Scheduling associative reductions with homogeneous costs when overlapping communications and computations

Canon

2013

20th Annual International Conference on High Performance Computing

View full text Add to dashboard Cite

Reduction is a core operation in parallel computing. Optimizing its cost has a high potential impact on the application execution time, particularly in MPI and MapReduce computations. In this paper, we propose an optimal algorithm for scheduling associative reductions. We focus on the case where communications and computations can be overlapped to fully exploit resources. Our algorithm greedily builds a spanning tree by starting from the sink and by adding a parent at each iteration. Bounds on the completion time of optimal schedules are then characterized. To show the algorithm extensibility, we adapt it to model variations in which either communication or computation resources are limited. Moreover, we study two specific spanning trees: while the binomial tree is optimal when there is either no transfer or no computation, the Fibonacci tree is optimal when the transfer cost is equal to the computation cost. Finally, approximation ratios of strategies that are derived from those trees are drawn.

show abstract

Section: Related Workmentioning

confidence: 99%

Scheduling associative reductions with homogeneous costs when overlapping communications and computations

Canon

2013

20th Annual International Conference on High Performance Computing

View full text Add to dashboard Cite

show abstract

“…But the general problem is very difficult: we have proved that determining the optimal throughput is a NP-complete problem. This negative result demonstrates that pipelining multicasts is more difficult than pipelining broadcasts, scatters or reduce operations, for which optimal polynomial algorithms have been introduced [6,22]. In particular, although the broadcast and the multicast problems seem very similar (the target set is the only difference), there is a complexity gap between them: the best throughput to broadcast message to all nodes in the platform can be found in polynomial time, whereas finding the best throughput to multicast a message to a strict subset of these nodes is NP-hard.…”

Section: Resultsmentioning

confidence: 99%

“…From the solution of the Multicast-UB(P, P target ) linear program, a solution achieving the same throughput can be easily obtained. The interested reader may refer to [22,21] where the reconstruction scheme from the linear program is presented. In this section, we prove that such a solution may differ at most by a factor |P target | from the solution of the throughput given by the solution of Multicast-LB(P, P target ), thus providing a heuristic with a guaranteed factor |P target |.…”

Section: Distance Between Lower and Upper Boundsmentioning

confidence: 99%

“…In previous papers, we have dealt with other communication primitives than the multicast operation. We have shown how to compute the optimal steady-state throughput for a series of scatter or reduce operations [22,21], and a series of broadcast operations [6,5]. The idea is to characterize the steady-state operation of each resource through a linear program in rational numbers (that can thus be solved with a complexity polynomial in the platform size), and then to derive a feasible periodic schedule from the output of the program (and to describe this schedule in polynomial size too).…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Complexity results and heuristics for pipelined multicast operations on heterogeneous platforms

Beaumont

Legrand

Marchal

et al. 2004

International Conference on Parallel Processing, 2004. ICPP 2004.

Self Cite

View full text Add to dashboard Cite

“…Reductions have been used in distributed programs for years, and standards such as Message Passing Interface (MPI) usually include a ‘reduce’ function together with other collective communications (see for experimental comparisons). Many algorithms have been introduced to optimize this operation on various platforms, with homogeneous or heterogeneous communication costs . Recently, this operation has received more attention because of the success of the MapReduce framework , which has been popularized by Google.…”

Section: Introductionmentioning

confidence: 99%

Non‐clairvoyant reduction algorithms for heterogeneous platforms

Benoît

Canon

Marchal

2014

Concurrency and Computation

Self Cite

View full text Add to dashboard Cite

Abstract:We revisit the classical problem of the reduction collective operation in a heterogeneous environment. We discuss and evaluate four algorithms that are non-clairvoyant, i.e., they do not know in advance the computation and communication costs. On the one hand, Binomial-stat and Fibonacci-stat are static algorithms that decide in advance which operations will be reduced, without adapting to the environment; they were originally defined for homogeneous settings. On the other hand, Tree-dyn and Non-Commut-Tree-dyn are fully dynamic algorithms, for commutative or non-commutative reductions. With identical computation costs, we show that these algorithms are approximation algorithms. When costs are exponentially distributed, we perform an analysis of Tree-dyn based on Markov chains. Finally, we assess the relative performance of all four non-clairvoyant algorithms with heterogeneous costs though a set of simulations.

show abstract

Optimizing the steady-state throughput of scatter and reduce operations on heterogeneous platforms

Cited by 5 publications

References 9 publications

Scheduling associative reductions with homogeneous costs when overlapping communications and computations

Scheduling associative reductions with homogeneous costs when overlapping communications and computations

Complexity results and heuristics for pipelined multicast operations on heterogeneous platforms

Non‐clairvoyant reduction algorithms for heterogeneous platforms

Contact Info

Product

Resources

About