Opportunity for compute partitioning in pursuit of energy-efficient systems

Heterogeneous multi-core architectures consisting of CPUs and GPUs are commonplace in today's embedded systems. These architectures offer potential for energy efficient computing if the application task is mapped to the right core. Realizing such potential is challenging due to the complex and evolving nature of hardware and applications. This paper presents an automatic approach to map OPENCL kernels onto heterogeneous multi-cores for a given optimization criterion-whether it is faster runtime, lower energy consumption or a trade-off between them. This is achieved by developing a machine learning based approach to predict which processor to use to run the OPENCL kernel and the host program, and at what frequency the processor should operate. Instead of hand-tuning a model for each optimization metric, we use machine learning to develop a unified framework that first automatically learns the optimization heuristic for each metric off-line, then uses the learned knowledge to schedule OPENCL kernels at runtime based on code and runtime information of the program. We apply our approach to a set of representative OPENCL benchmarks and evaluate it on an ARM big.LITTLE mobile platform. Our approach achieves over 93% of the performance delivered by a perfect predictor. We obtain, on average, 1.2x, 1.6x, and 1.8x improvement respectively for runtime, energy consumption and the energy delay product when compared to a comparative heterogeneous-aware OPENCL task mapping scheme.

show abstract

Adaptive optimization for OpenCL programs on embedded heterogeneous systems

Taylor

Marco

Wang

2017

SIGPLAN Not.

View full text Add to dashboard Cite

Heterogeneous multi-core architectures consisting of CPUs and GPUs are commonplace in today's embedded systems. These architectures offer potential for energy efficient computing if the application task is mapped to the right core. Realizing such potential is challenging due to the complex and evolving nature of hardware and applications. This paper presents an automatic approach to map OPENCL kernels onto heterogeneous multi-cores for a given optimization criterion -whether it is faster runtime, lower energy consumption or a trade-off between them. This is achieved by developing a machine learning based approach to predict which processor to use to run the OPENCL kernel and the host program, and at what frequency the processor should operate. Instead of hand-tuning a model for each optimization metric, we use machine learning to develop a unified framework that first automatically learns the optimization heuristic for each metric off-line, then uses the learned knowledge to schedule OPENCL kernels at runtime based on code and runtime information of the program. We apply our approach to a set of representative OPENCL benchmarks and evaluate it on an ARM big.LITTLE mobile platform. Our approach achieves over 93% of the performance delivered by a perfect predictor. We obtain, on average, 1.2x, 1.6x, and 1.8x improvement respectively for runtime, energy consumption and the energy delay product when compared to a comparative heterogeneous-aware OPENCL task mapping scheme.

show abstract

Opportunity for compute partitioning in pursuit of energy-efficient systems

Cited by 3 publications

References 5 publications

ABC-DIMM: Alleviating the Bottleneck of Communication in DIMM-based Near-Memory Processing with Inter-DIMM Broadcast

ABC-DIMM: Alleviating the Bottleneck of Communication in DIMM-based Near-Memory Processing with Inter-DIMM Broadcast

Adaptive optimization for OpenCL programs on embedded heterogeneous systems

Adaptive optimization for OpenCL programs on embedded heterogeneous systems

Contact Info

Product

Resources

About