Communication Characterization and Optimization of Applications Using Topology-Aware Task Mapping on Large Supercomputers

Sreepathi, Sarat; D’Azevedo, E.; Philip, Bobby; Worley, Patrick H

doi:10.1145/2851553.2851575

Cited by 6 publications

(2 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Figure 1 demonstrates runtime variability when running fast Fourier Transform programs [9,10] for solving the Navier-Stokes equations on Hazelhen and Shaheen II. Both these computers use adaptive routing and job placement to speed up job throughput [18], but this can result in some run time variability, a subject of current research [7,28,35,38,39].…”

Section: Lessons Learnedmentioning

confidence: 99%

Reproducibility in Benchmarking Parallel Fast Fourier Transform based Applications

Aseeri

Muite

Takahashi

2019

Companion of the 2019 ACM/SPEC International Conference on Performance Engineering

View full text Add to dashboard Cite

An overview of concerns observed in allowing for reproducibility in parallel applications that heavily depend on the three dimensional distributed memory fast Fourier transform are summarized. Suggestions for reproducibility categories for benchmark results are given. CCS CONCEPTS• Mathematics of computing → Computation of transforms;• Theory of computation → Massively parallel algorithms;• Software and its engineering → Software performance; • Hardware → Testing with distributed and parallel systems.

show abstract

Section: Lessons Learnedmentioning

confidence: 99%

Reproducibility in Benchmarking Parallel Fast Fourier Transform based Applications

Aseeri

Muite

Takahashi

2019

Companion of the 2019 ACM/SPEC International Conference on Performance Engineering

View full text Add to dashboard Cite

show abstract

“…As a result, hypergraph partitioning also leads to a reduction in parallel communication requirements, albeit at a larger one-time setup cost. Topology-aware task mapping is used to accurately map partitions to the allocated nodes of a supercomputer, reducing the overall cost associated with communication [11,12,13,14,15]. The approach introduced in this paper complements these efforts by providing an additional level of optimization in handling communication.Topology-aware methods and aggregation of data are commmonly used to reduce communication costs, particularly in collective operations [16,17,18,19].…”

mentioning

confidence: 99%

Node aware sparse matrix–vector multiplication

Bienz

Gropp

Olson

2019

Journal of Parallel and Distributed Computing

View full text Add to dashboard Cite

The sparse matrix-vector multiply (SpMV) operation is a key computational kernel in many simulations and linear solvers. The large communication requirements associated with a reference implementation of a parallel SpMV result in poor parallel scalability. The cost of communication depends on the physical locations of the send and receive processes: messages injected into the network are more costly than messages sent between processes on the same node. In this paper, a node aware parallel SpMV (NAPSpMV) is introduced to exploit knowledge of the system topology, specifically the node-processor layout, to reduce costs associated with communication. The values of the input vector are redistributed to minimize both the number and the size of messages that are injected into the network during a SpMV, leading to a reduction in communication costs. A variety of computational experiments that highlight the efficiency of this approach are presented. aspects of the topology -e.g. socket information -could be used in a similar fashion. The mapping of virtual ranks to physical processors can be easily determined on many super computers. The flag MPICH RANK REORDER METHOD can be set to a predetermined ordering on Cray machines, while modern Blue Gene machines allow the user to specify the ordering among the coordinates A, B, C, D, E, and T through the variable RUNJOB MAPPING or a runscript option of --mapping.There are a number of existing approaches for reducing communication costs associated with sparse matrix-vector multiplication. Communication volume in particular is a limiting factor and the ordering and parallel partition of a matrix both influence the total data volume. In response, graph partitioning techniques are used to identify more efficient layouts in the data [4,5,6,7]. ParMETIS [8] and PT-Scotch [9], for example, provide parallel partitioning of matrices that often lead to improved system loads and more efficient sparse matrix operations. Communication volume is accurately modeled through the use of a hypergraph [10]. As a result, hypergraph partitioning also leads to a reduction in parallel communication requirements, albeit at a larger one-time setup cost. Topology-aware task mapping is used to accurately map partitions to the allocated nodes of a supercomputer, reducing the overall cost associated with communication [11,12,13,14,15]. The approach introduced in this paper complements these efforts by providing an additional level of optimization in handling communication.Topology-aware methods and aggregation of data are commmonly used to reduce communication costs, particularly in collective operations [16,17,18,19]. Aggregation of data is used in point to point communication through Tram, a library for streamlining messages in which data is aggregated and communicated only through neighboring processors [20]. The method presented in this paper aggregates messages at the node level and communicates all aggregated data at once, yielding little structural change from standard MPI communication while reducin...

show abstract

TAMM: A New Topology-Aware Mapping Method for Parallel Applications on the Tianhe-2A Supercomputer

Chen

Liu

et al. 2018

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Communication Characterization and Optimization of Applications Using Topology-Aware Task Mapping on Large Supercomputers

Cited by 6 publications

References 21 publications

Reproducibility in Benchmarking Parallel Fast Fourier Transform based Applications

Reproducibility in Benchmarking Parallel Fast Fourier Transform based Applications

Node aware sparse matrix–vector multiplication

TAMM: A New Topology-Aware Mapping Method for Parallel Applications on the Tianhe-2A Supercomputer

Contact Info

Product

Resources

About