Optimal processor dynamic-energy reduction for parallel workloads on heterogeneous multi-core architectures

Barros, Carlos Avelino de; Silveira, Luiz F. Q.; Valderrama, Carlos; Xavier‐de‐Souza, Samuel

doi:10.1016/j.micpro.2015.05.009

Cited by 15 publications

(9 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Moreover, we do not consider or require different frequencies for each core in a cluster, i.e., all the cores in a cluster run at the same frequency. Inspired by other works [33][34][35], we devised the following model that can be used to estimate the performance of a given parallel application running on a two-cluster HMP.…”

Section: Application Performance Modellingmentioning

confidence: 99%

Performance and Energy Trade-Offs for Parallel Applications on Heterogeneous Multi-Processing Systems

et al. 2020

View full text Add to dashboard Cite

This work proposes a methodology to find performance and energy trade-offs for parallel applications running on Heterogeneous Multi-Processing systems with a single instruction-set architecture. These offer flexibility in the form of different core types and voltage and frequency pairings, defining a vast design space to explore. Therefore, for a given application, choosing a configuration that optimizes the performance and energy consumption is not straightforward. Our method proposes novel analytical models for performance and power consumption whose parameters can be fitted using only a few strategically sampled offline measurements. These models are then used to estimate an application’s performance and energy consumption for the whole configuration space. In turn, these offline predictions define the choice of estimated Pareto-optimal configurations of the model, which are used to inform the selection of the configuration that the application should be executed on. The methodology was validated on an ODROID-XU3 board for eight programs from the PARSEC Benchmark, Phoronix Test Suite and Rodinia applications. The generated Pareto-optimal configuration space represented a 99% reduction of the universe of all available configurations. Energy savings of up to 59.77%, 61.38% and 17.7% were observed when compared to the performance, ondemand and powersave Linux governors, respectively, with higher or similar performance.

show abstract

Section: Application Performance Modellingmentioning

confidence: 99%

Performance and Energy Trade-Offs for Parallel Applications on Heterogeneous Multi-Processing Systems

et al. 2020

View full text Add to dashboard Cite

show abstract

“…A variable-aware DVFS method was proposed in [9], in which the status of the processor was changed according to the variables of voltage, temperature, process parameter and so on, rather than using frequency threshold and greedy policy. In [10] the models were made for both homogeneous and heterogenous processors, the authors tried to reduce the dynamic power of the processor, and compared the dynamic energy consumption for processing tasks on the two kinds of processors. To find the optimal frequency of each core when processing tasks, decision tree was adopted in [11] to minimize the energy consumption of each user instruction.…”

Section: Related Workmentioning

confidence: 99%

Cost-Aware Scheduling of Computation-Intensive Tasks on Multi-Core Server

2018

TIIS

View full text Add to dashboard Cite

Energy-efficient task scheduling on multi-core server is a fundamental issue in green cloud computing. Multi-core processors are widely used in mobile devices, personal computers, and servers. Existing energy efficient task scheduling methods chiefly focus on reducing the energy consumption of the processor itself, and assume that the cores of the processor are controlled independently. However, the cores of some processors in the market are divided into several voltage islands, in each of which the cores must operate on the same status, and the cost of the server includes not only energy cost of the processor but also the energy of other components of the server and the cost of user waiting time. In this paper, we propose a cost-aware scheduling algorithm ICAS for computation intensive tasks on multi-core server. Tasks are first allocated to cores, and optimal frequency of each core is computed, and the frequency of each voltage island is finally determined. The experiments' results show the cost of ICAS is much lower than the existing method.

show abstract

“…The effect caused by storage mode to algorithm time performance specifically manifests as the latency waiting for the completion of memory access. To balance the cache latency and hit rate, modern CPU usually has a multi-level Cache structure to reduce the average memory access time [15,19]. Before the CPU access memory, multi-level Cache is sequentially inquired until it hits, or memory is accessed when it misses.…”

Section: Storage Optimizationmentioning

confidence: 99%

Parallel Gauss-Jordan Elimination on Two-Dimensional Constant Bandwidth Storage

Liu¹,

Xiong²

2017

IJGDC

View full text Add to dashboard Cite

As commonly used in two-dimensional magnetotelluric (MT) inversion, Occam inversion method, though featured by its stable convergence and independence on initial model, is relatively poor in time performance, with Gauss-Jordan elimination as one of the biggest time-consuming parts. For symmetrical banded coefficient matrix on two-dimensional constant bandwidth storage, the application of sequential dispatch strategy to the workload on lines of work triangle in parallel algorithm will lead to load imbalance. Therefore, based on paired-dispatch, a kind of strategy is presented to solve this problem, after which the algorithm's effect on shared memory parallel system is verified and research is carried on to focus on the optimization methods of algorithm performance. Experimental results as compared with serial algorithm show that these methods have effectively improved the performance of algorithm, which contributes to a speedup of 3.72 of parallel Gauss-Jordan algorithm as a whole, and that MT Occam inversion algorithm based on this parallel algorithm also demonstrates good speedup performance.

show abstract

Optimal processor dynamic-energy reduction for parallel workloads on heterogeneous multi-core architectures

Cited by 15 publications

References 23 publications

Performance and Energy Trade-Offs for Parallel Applications on Heterogeneous Multi-Processing Systems

Performance and Energy Trade-Offs for Parallel Applications on Heterogeneous Multi-Processing Systems

Cost-Aware Scheduling of Computation-Intensive Tasks on Multi-Core Server

Parallel Gauss-Jordan Elimination on Two-Dimensional Constant Bandwidth Storage

Contact Info

Product

Resources

About