Scalable communication protocols for dynamic sparse data exchange

Hoefler, Torsten; Siebert, Christian H.; Lumsdaine, Andrew

doi:10.1145/1693453.1693476

Cited by 44 publications

(50 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A solution to this problem is to perform step 6 using a variant of the nonblocking consensus algorithm described by Höfler et al [9]. All sends and receives are tested for completion, a nonblocking barrier posted once the tests are passed, and incoming messages probed for with MPI Iprobe, all in a loop which only exits if the nonblocking barrier is reached by all participating processes.…”

Section: ) Smentioning

confidence: 99%

“…This variant differs slightly from Höfler's -tests for completion take place before probing. This is because the probe (and subsequent communication) is usually unnecessary, and both MPI Iprobe and MPI Ssend do have overhead (see [9] for some measurements pertaining to MPI Ssend). , is zero in either reference frame (this would not be true if differentiating across the interface, but it is so for a change of (inertial) reference frame), and…”

Section: ) Smentioning

confidence: 99%

See 1 more Smart Citation

A sliding characteristic interface condition for direct numerical simulations

2015

View full text Add to dashboard Cite

A characteristic interface condition serves as the basis for a novel sliding grid method, with a view to solving the compressible Navier-Stokes equations on block-structured grids that are delimited by boundary conditions in motion relative to each other. This requires that the convective and source terms of the equations in characteristic form be transformed to the reference frame of the neighbouring block, and interpolated. The method facilitates accurate interpolation at the interface, because the characteristic interface condition requires only a single layer of halo nodes. When a homogeneous direction is present, only 1-D interpolation is required, and schemes that might otherwise be too costly become affordable. The treatment also enjoys the same advantages as fixed characteristic interfaces do in relation to tolerance of grid discontinuities at block interfaces. The implementation and parallelisation of this method in a simulation code is described, and accuracy and performance demonstrated on a selection of test cases.

show abstract

Section: ) Smentioning

confidence: 99%

Section: ) Smentioning

confidence: 99%

A sliding characteristic interface condition for direct numerical simulations

2015

View full text Add to dashboard Cite

show abstract

“…For example, in preparing the data exchange necessary to obtain column data in sparse matrix-vector multiplication, this would be the number of processors having entries on that column [7]. We use the fanout, denoted f , to describe the bounds on our algorithms' costs.…”

Section: Fanout Boundmentioning

confidence: 99%

“…At the same time, the popularity of distributed parallel programming systems that implement high degrees of dynamic behavior, such as asynchronous tasks [1], work stealing [4,11,12], and messagedriven execution [8,18,19], are increasing. Unlike in bulk synchronous parallel programs, and even in dynamic data exchanges within BSP programs [7], there is often no clear global indication of when some particular distributed computation is complete. Thus, they instead rely on termination detection algorithms to provide that indication.…”

Section: Introductionmentioning

confidence: 99%

Adoption protocols for fanout-optimal fault-tolerant termination detection

Lifflander

Miller

Kalé

2013

Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

View full text Add to dashboard Cite

Termination detection is relevant for signaling completion (all processors are idle and no messages are in flight) of many operations in distributed systems, including work stealing algorithms, dynamic data exchange, and dynamically structured computations. In the face of growing supercomputers with increasing likelihood that each job may encounter faults, it is important for high-performance computing applications that rely on termination detection that such an algorithm be able to tolerate the inevitable faults. We provide a trio of new practical fault tolerance schemes for a standard approach to termination detection that are easy to implement, present low overhead in both theory and practice, and have scalable costs when recovering from faults. These schemes tolerate all singleprocess faults, and are probabilistically tolerant of faults affecting multiple processes. We combine the theoretical failure probabilities we can calculate for each algorithm with historical fault records from real machines to show that these algorithms have excellent overall survivability.

show abstract

“…Hoefler et al [14] study this problem and its variations, and design new, specialized collectives to address this problem. Hoefler and Träff [15] also make the case for better support of "sparse" communication patterns within MPI, where the sparsity refers to the number of communicating processes.…”

Section: Collective Communication Optimizationmentioning

confidence: 99%

Cosmic microwave background map-making at the petascale and beyond

Sudarsan

Borrill

Cantalupo

et al. 2011

Proceedings of the International Conference on Supercomputing

View full text Add to dashboard Cite

The analysis of Cosmic Microwave Background (CMB) observations is a long-standing computational challenge, driven by the exponential growth in the size of the data sets being gathered. Since this growth is projected to continue for at least the next decade, it will be critical to extend the analysis algorithms and their implementations to peta-scale high performance computing (HPC) systems and beyond. The most computationally intensive part of the analysis is generating and reducing Monte Carlo realizations of an experiment's data. In this work we take the current stateof-the-art simulation and mapping software and investigate its performance when pushed to tens of thousands of cores on a range of leading HPC systems, in particular focusing on the communication bottleneck that emerges at high concurrencies. We present a new communication strategy that removes this bottleneck, allowing for CMB analyses of unprecedented scale and hence fidelity. Experimental results show a communication speedup of up to 116× using our alternative strategy.

show abstract

Scalable communication protocols for dynamic sparse data exchange

Cited by 44 publications

References 30 publications

A sliding characteristic interface condition for direct numerical simulations

A sliding characteristic interface condition for direct numerical simulations

Adoption protocols for fanout-optimal fault-tolerant termination detection

Cosmic microwave background map-making at the petascale and beyond

Contact Info

Product

Resources

About