On the performance and energy efficiency of the PGAS programming model on multicore architectures

Lagravière, Jérémie; Langguth, Johannes; Sourouri, Mohammed; Ha, Phuong Hoai; Cai, Xing

doi:10.1109/hpcsim.2016.7568416

Cited by 4 publications

(3 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However when using 64 nodes (1024 cores) UPC++ has a strong advantage over UPC. We observed this behavior of UPC having trouble to perform with more than 512 threads before [13,14].…”

Section: Scalability and Performancementioning

confidence: 71%

A Newcomer In The PGAS World -- UPC++ vs UPC: A Comparative Study

Lagravière,

Langguth,

Prugger

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

A newcomer in the Partitioned Global Address Space (PGAS) 'world' has arrived in its version 1.0: Unified Parallel C++ (UPC++). UPC++ targets distributed data structures where communication is irregular or fine-grained. The key abstractions are global pointers, asynchronous programming via RPC, futures and promises. UPC++ API for moving non-contiguous data and handling memories with different optimal access methods resemble those used in modern C++. In this study we provide two kernels implemented in UPC++: a sparse-matrix vector multiplication (SpMV) as part of a Partial-Differential Equation solver, and an implementation of the Heat Equation on a 2D-domain. Code listings of these two kernels are available in the article in order to show the differences in programming style between UPC and UPC++. We provide a performance comparison between UPC and UPC++ using single-node, multi-node hardware and many-core hardware (Intel Xeon Phi Knight's Landing).

show abstract

“…However when using 64 nodes (1024 cores) UPC++ has a strong advantage over UPC. We observed this behavior of UPC having trouble to perform with more than 512 threads before [13,14].…”

Section: Scalability and Performancementioning

confidence: 71%

A Newcomer In The PGAS World -- UPC++ vs UPC: A Comparative Study

Lagravière,

Langguth,

Prugger

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…As described in Deliverable D2.3, understanding the energy complexity of algorithms is crucially important to improve the energy efficiency of algorithms and reduce the energy consumption of computing systems [74,96]. One of the main approaches to understand the energy complexity of algorithms is to devise energy models.…”

Section: Introductionmentioning

confidence: 99%

D2.4 Report on the final prototype of programming abstractions for energy-efficient inter-process communication

Ha¹,

Tran²,

Ibrahim³

et al. 2018

Preprint

View full text Add to dashboard Cite

Work package 2 (WP2) aims to develop libraries for energy-efficient inter-process communication and data sharing on the EXCESS platforms. The Deliverable D2.4 reports on the final prototype of programming abstractions for energy-efficient interprocess communication. Section 1 is the updated overview of the prototype of programming abstraction and devised power/energy models. The Section 2-6 contain the latest results of the four studies:D2.4: Report on the final prototype of programming abstractions 6 D2.3, this model proposed using Ideal Cache memory model to compute I/O complexity of the algorithms. Besides a case study of SpMV to demonstrate how to apply the ICE model to find energy complexity of parallel algorithms, Deliverable D2.4 also reports a case study to apply the ICE model to Dense Matrix Multiplication (matmul). The model is then validated with both data-intensive (i.e., SpMV) and computationintensive (i.e., matmul) algorithms according to three aspects: different algorithms, different input types/sizes and different platforms. In order to make the reading flow easy to follow, we include in this report a complete study of ICE model along with latest results.D2.4: Report on the final prototype of programming abstractions Contents 1.3 Energy Model on CPU for Lock-free Data-structures in Dynamic EnvironmentsIn this section, we firstly consider the modeling and the analysis of the performance of lockfree data structures. Then, we combine the perfomance analysis with our power model that is introduced in D2.1 [75] and D2.3 [73] to estimate the energy efficiency of lock-free data structures that are used in various settings.Lock-free data structures are based on retry loops and are called by application-specific routines. In contrast to the model and analysis provided in D2.3, we consider here the lock-free data structures in dynamic environments. The size of each of the retry loops, and the size of the application routines invoked in between, are not constant but may change dynamically.We present two analytical frameworks for calculating the performance of lock-free data structures. The new frameworks follow two different approaches. The first framework, the simplest one, is based on queuing theory. It introduces an average-based approach that facilitates a more coarse-grained analysis, with the benefit of being ignorant of size distributions. Because of this independence from the distribution nature it covers a set of complicated designs. The second approach, instantiated with an exponential distribution for the size of the application routines, uses Markov chains, and is tighter because it constructs stochastically the execution, step by step.Both frameworks provide a performance estimate which is close to what we observe in practice. We have validated our analysis on (i) several fundamental lock-free data structures such as stacks, queues, deques and counters, some of them employing dynamic helping mechanisms, and (ii) synthetic tests covering a wide range of possible lock-free designs. We show the ap...

show abstract

“…Understanding the energy complexity of algorithms is crucial important to improve the energy efficiency of algorithms [31,30,29,20] and reduce the energy consumption of computing systems [28,27,21]. One of the main approaches to understand the energy complexity of algorithms is to devise energy models.…”

Section: Introductionmentioning

confidence: 99%

ICE: A General and Validated Energy Complexity Model for Multithreaded Algorithms

Tran

2016

2016 IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS)

View full text Add to dashboard Cite

Like time complexity models that have significantly contributed to the analysis and development of fast algorithms, energy complexity models for parallel algorithms are desired as crucial means to develop energy efficient algorithms for ubiquitous multicore platforms. Ideal energy complexity models should be validated on real multicore platforms and applicable to a wide range of parallel algorithms. However, existing energy complexity models for parallel algorithms are either theoretical without model validation or algorithm-specific without ability to analyze energy complexity for a wide-range of parallel algorithms. This paper presents a new general validated energy complexity model for parallel (multithreaded) algorithms. The new model abstracts away possible multicore platforms by their static and dynamic energy of computational operations and data access, and derives the energy complexity of a given algorithm from its work, span and I/O complexity. The new model is validated by different sparse matrix vector multiplication (SpMV) algorithms and dense matrix multiplication (matmul) algorithms running on high performance computing (HPC) platforms (e.g., Intel Xeon and Xeon Phi). The new energy complexity model is able to characterize and compare the energy consumption of SpMV and matmul kernels according to three aspects: different algorithms, different input matrix types and different platforms. The prediction of the new model regarding which algorithm consumes more energy with different inputs on different platforms, is confirmed by the experimental results. In order to improve the usability and accuracy of the new model for a wide range of platforms, arXiv:1605.08222v2 [cs.DC] 4 Oct 2016 the platform parameters of ICE model are provided for eleven platforms including HPC, accelerator and embedded platforms.

show abstract

On the performance and energy efficiency of the PGAS programming model on multicore architectures

Cited by 4 publications

References 25 publications

A Newcomer In The PGAS World -- UPC++ vs UPC: A Comparative Study

A Newcomer In The PGAS World -- UPC++ vs UPC: A Comparative Study

D2.4 Report on the final prototype of programming abstractions for energy-efficient inter-process communication

ICE: A General and Validated Energy Complexity Model for Multithreaded Algorithms

Contact Info

Product

Resources

About