Optimizing message-passing on multicore architectures using hardware multi-threading

Buono, Daniele; Matteis, Tiziano De; Mencagli, Gabriele; Vanneschi, Marco

doi:10.1109/pdp.2014.63

Cited by 11 publications

(2 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…[15] and used to synchronize and exchange messages between threads like in Ref. [38]. The mechanism provides a point-to-point communication between two partners, sender (S) and receiver (R), with a buffer of one position.…”

Section: The Base Mechanismmentioning

confidence: 99%

The home-forwarding mechanism to reduce the cache coherence overhead in next-generation CMPs

Mencagli

Vanneschi

Lametti

2018

Future Generation Computer Systems

Self Cite

View full text Add to dashboard Cite

On the road to computer systems able to support the requirements of exascale applications, Chip Multi-Processors (CMPs) are equipped with an ever increasing number of cores interconnected through fast on-chip networks. To exploit such new architectures, the parallel software must be able to scale almost linearly with the number of cores available. To this end, the overhead introduced by the run-time system of parallel programming frameworks and by the architecture itself must be small enough in order to enable high scalability also for very fine-grained parallel programs. An approach to reduce this overhead is to use non-conventional architectural mechanisms revealing useful when certain concurrency patterns in the running application are statically or dynamically recognized. Following this idea, this paper proposes a run-time support able to reduce the effective latency of inter-thread cooperation primitives by lowering the contention on individual caches. To achieve this goal, the new home-forwarding hardware mechanism is proposed and used by our runtime in order to reduce the amount of cache-to-cache interactions generated by the cache coherence protocol. Our ideas have been emulated on the Tilera TILEPro64 CMP, showing a significant speedup improvement in some first benchmarks

show abstract

Section: The Base Mechanismmentioning

confidence: 99%

The home-forwarding mechanism to reduce the cache coherence overhead in next-generation CMPs

Mencagli

Vanneschi

Lametti

2018

Future Generation Computer Systems

Self Cite

View full text Add to dashboard Cite

show abstract

“…MPI allows to communicate among cores on different nodes, and one could think that it introduces performance overheads at the node level compared with OpenMP. But this is a controversial issue with no clear answer as shown in [14,6].…”

Section: Introductionmentioning

confidence: 99%

OMP2MPI: Automatic MPI code generation from OpenMP programs

Saa-Garriga,

Castells-Rufas,

Carrabina

2015

Preprint

View full text Add to dashboard Cite

In this paper, we present OMP2MPI a tool that generates automatically MPI source code from OpenMP. With this transformation the original program can be adapted to be able to exploit a larger number of processors by surpassing the limits of the node level on large HPC clusters. The transformation can also be useful to adapt the source code to execute in distributed memory many-cores with message passing support. In addition, the resulting MPI code can be used as an starting point that still can be further optimized by software engineers. The transformation process is focused on detecting OpenMP parallel loops and distributing them in a master/worker pattern. A set of micro-benchmarks have been used to verify the correctness of the the transformation and to measure the resulting performance. Surprisingly not only the automatically generated code is correct by construction, but also it often performs faster even when executed with MPI.

show abstract

Fundamental Concepts

Lorenzon¹,

Filho²

2019

Parallel Computing Hits the Power Wall

View full text Add to dashboard Cite

Optimizing message-passing on multicore architectures using hardware multi-threading

Cited by 11 publications

References 19 publications

The home-forwarding mechanism to reduce the cache coherence overhead in next-generation CMPs

The home-forwarding mechanism to reduce the cache coherence overhead in next-generation CMPs

OMP2MPI: Automatic MPI code generation from OpenMP programs

Fundamental Concepts

Contact Info

Product

Resources

About