Algorithmic Time, Energy, and Power on Candidate HPC Compute Building Blocks

Choi, Jee; Dukhan, Marat; Liu, Xing; Vuduc, Richard

doi:10.1109/ipdps.2014.54

Cited by 60 publications

(48 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…5, right) and than other affinity modes until 180 threads. Since CG is the most memory-accessing application among the ones considered here as was shown in [23], this situation may be explained by the findings in [13] that Intel Xeon Phi uses less energy for memory accesses. Under the compact affinity mode, the neighboring threads, which are more likely to access memory simultaneously, are located in the same core.…”

Section: ) Energymentioning

confidence: 87%

“…An instruction-level energy model has been used by Shao and Brooks [12] with the Linpack benchmark suite to observe increases in energy efficiency as high as 10%. Choi et al [13] conducted a microbenchmarking study and found that the Intel Xeon Phi offers energy benefits to highly irregular data processing workloads. The Xeon Phi requires one magnitude less energy per access during random memory access operations.…”

Section: A Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Energy Evaluation for Applications with Different Thread Affinities on the Intel Xeon Phi

Lawson

Sosonkina

Shen

2014

2014 International Symposium on Computer Architecture and High Performance Computing Workshop

View full text Add to dashboard Cite

The Intel Xeon Phi coprocessor offers high par-allelism on energy-efficient hardware to minimize energy consumption while maintaining performance. Dynamic frequency and voltage scaling is not accessible on the Intel Xeon Phi. Hence, saving energy relies mainly on tuning application performance. One general optimization technique is thread affinity, which is an important factor in multi-core architectures. This work investigates the effects of varying thread affinity modes and reducing core utilization on energy and execution time for the NASA Advanced Supercomputing Parallel Benchmarks (NPB). Energy measurements are captured using the micsmc utility tool available on Xeon Phi. The measurements are checked against total power captured using Wattsup power meters. The results are compared to the system-default thread affinity and granularity modes. Mostly positive impacts on performance and energy are observed: When executed at the maximum thread count on all unoccupied cores, all the benchmarks but one exhibited energy savings if a specific affinity mode is set. KeywordsIntel Xeon Phi, energy, thread affinity, NAS benchmarks Abstract-The Intel Xeon Phi coprocessor offers high parallelism on energy-efficient hardware to minimize energy consumption while maintaining performance. Dynamic frequency and voltage scaling is not accessible on the Intel Xeon Phi. Hence, saving energy relies mainly on tuning application performance. One general optimization technique is thread affinity, which is an important factor in multi-core architectures. This work investigates the effects of varying thread affinity modes and reducing core utilization on energy and execution time for the NASA Advanced Supercomputing Parallel Benchmarks (NPB). Energy measurements are captured using the micsmc utility tool available on Xeon Phi. The measurements are checked against total power captured using Wattsup power meters. The results are compared to the system-default thread affinity and granularity modes. Mostly positive impacts on performance and energy are observed: When executed at the maximum thread count on all unoccupied cores, all the benchmarks but one exhibited energy savings if a specific affinity mode is set.

show abstract

Section: ) Energymentioning

confidence: 87%

Section: A Related Workmentioning

confidence: 99%

Energy Evaluation for Applications with Different Thread Affinities on the Intel Xeon Phi

Lawson

Sosonkina

Shen

2014

2014 International Symposium on Computer Architecture and High Performance Computing Workshop

View full text Add to dashboard Cite

show abstract

“…In [17], the impact of data movement on the total energy consumption is characterized for the NAS parallel benchmarks and several scientific applications. In [6], the authors extend their energy roofline model to cap-ture arithmetic and basic cache memory energy access costs as well as more elaborate random access patterns.…”

Section: Related Workmentioning

confidence: 99%

Modeling power consumption of 3D MPDATA and the CG method on ARM and Intel multicore architectures

2017

View full text Add to dashboard Cite

We propose an approach to estimate the power consumption of algorithms, as a function of the frequency and number of cores, using only a very reduced set of real power measures. In addition, we also provide the formulation of a method to select the voltage-frequency scaling-concurrency throttling configurations that should be tested in order to obtain accurate estimations of the power dissipation. The power models and selection methodology are verified using two real scientific application: the stencil-based 3D MPDATA algorithm and the conjugate gradient (CG) method for sparse linear systems. MPDATA is a crucial component of the EULAG model, which is widely used in weather forecast simulations. The CG algorithm is the keystone for iterative solution of sparse symmetric positive definite linear systems via Krylov subspace methods. The reliability of the method is confirmed for a variety of ARM The researchers from Czestochowa University of Technology were supported by the National Science Centre, Poland, under Grant No. UMO-2015/17/D/ST6/04059. The researcher from Universidad Jaime I (UJI) was supported by the CICYT Project TIN2014-53495-R of MINECO and FEDER. This work was partially performed during a short-term scientific mission (STSM)

show abstract

“…There are also some approaches that model the energy consumption of individual algorithms by considering the operations performed [59], however these approaches are difficult to transfer to other algorithms and they require a significant effort for the analysis at the algorithmic level. Another attempt in finding a relation between properties of the algorithms and the resulting energy consumption and execution time is described in [25], but the results are only presented at the level of micro-benchmarks. So far, there is no broad investigation that determines which algorithmic properties have which effect on the energy consumption for a specific architecture.…”

Section: Algorithmic Techniques Towards Energy Awarenessmentioning

confidence: 99%

Energy-efficient Algorithms for Ultrascale Systems

Carretero

Distefano

Petcu

et al. 2015

JSFI

View full text Add to dashboard Cite

The chances to reach Exascale or Ultrascale Computing are strongly connected with the problem of the energy consumption for processing applications. For physical and economical reasons, the energy consumption has to be reduced significantly to make Ultrascale Computing possible. The research efforts towards energy-saving mechanisms of the hardware have already made energy-aware hardware systems available. However, to achieve a strong energy reduction, hardware mechanisms must be complemented with new energy-efficient software that can exploit them so that the foreseen energy savings actually result. In the software area, there also exist a multitude of research approaches towards energy saving, often concentrating either on the system software level or the application organization level, reflecting the expertise of the corresponding research group. The challenge of reducing the energy consumption dramatically to make Ultrascale Computing possible is so ambitious that a concerted action combining research efforts through all the software levels seems reasonable. In this article, we discuss the current research efforts and results related to energy efficiency in the diverse areas of software. We conclude with open problems and questions concerning energy-related techniques with an emphasis on the application or algorithmic side.

show abstract

Algorithmic Time, Energy, and Power on Candidate HPC Compute Building Blocks

Cited by 60 publications

References 22 publications

Energy Evaluation for Applications with Different Thread Affinities on the Intel Xeon Phi

Energy Evaluation for Applications with Different Thread Affinities on the Intel Xeon Phi

Modeling power consumption of 3D MPDATA and the CG method on ARM and Intel multicore architectures

Energy-efficient Algorithms for Ultrascale Systems

Contact Info

Product

Resources

About