2020
DOI: 10.1145/3431731
|View full text |Cite
|
Sign up to set email alerts
|

A Simple Model for Portable and Fast Prediction of Execution Time and Power Consumption of GPU Kernels

Abstract: Characterizing compute kernel execution behavior on GPUs for efficient task scheduling is a non-trivial task. We address this with a simple model enabling portable and fast predictions among different GPUs using only hardware-independent features. This model is built based on random forests using 189 individual compute kernels from benchmarks such as Parboil, Rodinia, Polybench-GPU, and SHOC. Evaluation of the model performance using cross-validation yields a median Mean Average Percentage Error (MAPE) of 8.86… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
12
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
7
2

Relationship

0
9

Authors

Journals

citations
Cited by 21 publications
(12 citation statements)
references
References 41 publications
0
12
0
Order By: Relevance
“…Dublish et al [13] and Ardalani et al [5] used regression to predict the performance of GPU architecture. Braun et al [8] and O'neal et al [40] proposed a model based on random forests. Wu et al [53] exploited neural networks, and Guerreiro et al [21] developed a recurrent neural network-based model, which takes the sequence of PTX instructions.…”
Section: Ml-based Performance Evaluationmentioning
confidence: 99%
“…Dublish et al [13] and Ardalani et al [5] used regression to predict the performance of GPU architecture. Braun et al [8] and O'neal et al [40] proposed a model based on random forests. Wu et al [53] exploited neural networks, and Guerreiro et al [21] developed a recurrent neural network-based model, which takes the sequence of PTX instructions.…”
Section: Ml-based Performance Evaluationmentioning
confidence: 99%
“…Approximation techniques have also been used over the years to accelerate the simulation of individual components [11], [12]. E.g., by using simple core models (also known as 1-IPC core models) such as those implemented in Sniper [13] and CMP$im [14] if the interest is placed on evaluating the cache hierarchy or the memory system.…”
Section: B Overview Of Simulation Techniquesmentioning
confidence: 99%
“…Baldini et al [5] use existing OpenMP applications and supervised learning to predict the potential GPU execution speedup among different vendors. Brown et al [8] present a model that allows to get accurate predictions of speedups using a small set of features, while also being portable portability across Nvidia GPUs with different capabilities. Adams et al [1] propose a novel scheduling algorithm for the Halide programming language that targets image processing pipelines.…”
Section: Related Workmentioning
confidence: 99%