Predictable GPUs Frequency Scaling for Energy and Performance

Fan, Kaijie; Cosenza, Biagio; Juurlink, Ben

doi:10.1145/3337821.3337833

Cited by 28 publications

(24 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…al. [12], is to consider a multiobjective optimization problem, with a set of Pareto-optimal solutions. In other words, one could search for the V-F configurations that maximize the speedup and minimize the normalized energy, i.e., the configurations that are not dominated by any other configuration.…”

Section: B Dvfs Impact On Application Behaviormentioning

confidence: 99%

“…In particular, Alavani et al [10] presented a way to predict the execution time of an application prior to its execution, with an average prediction error of 26.9% on a Tesla K20 GPU (Kepler). On the other hand, Fan et al [12] developed DVFS-aware static models for performance and energy of GPU devices. The two models are trained based on a static vector of 10 features, where each component represents the count of a type of instructions.…”

Section: Related Workmentioning

confidence: 99%

“…The main use-case of predictive models, such as the ones herein proposed, is to perform the DVFS management to maximize the energy efficiency of the computing system. Considering a multi-objective optimization problem with a set of Pareto-optimal solutions, similar to the one that was proposed in [12], this technique can be a useful approach to find the best V-F configurations for different applications. Fig.…”

Section: Pareto-optimal Solutionsmentioning

confidence: 99%

“…An alternative and highly promising approach consists in providing predictions of the DVFS impact on the application behavior, prior to its execution. This alternative relies on using the GPU assembly of the kernels 2 [10]- [12] (described in the NVIDIA PTX ISA [13]), which can be obtained at compile-time. Although this approach is expected to yield less accurate results (when compared with state-of-the-art run-time models), it allows the first execution of an application to be done at a close to the optimal V-F configuration.…”

Section: Introductionmentioning

confidence: 99%

“…To that end, the proposed methodology uses the PTX assembly code given by the compiler. However, unlike previous works that simply rely on general code statistics, such as the histogram of instructions in the PTX code [12], [14], the proposed approach takes a step further and considers the specific sequence of kernel instructions, to improve the prediction accuracy. To model how the pattern of instructions stresses the GPU components, thus contributing to different performance, power and energy scalings, a deep neural network is used.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

GPU Static Modeling Using PTX and Deep Structured Learning

et al. 2019

View full text Add to dashboard Cite

In the quest for exascale computing, energy-efficiency is a fundamental goal in highperformance computing systems, typically achieved via dynamic voltage and frequency scaling (DVFS). However, this type of mechanism relies on having accurate methods of predicting the performance and power/energy consumption of such systems. Unlike previous works in the literature, this research focuses on creating novel GPU predictive models that do not require run-time information from the applications. The proposed models, implemented using recurrent neural networks, take into account the sequence of GPU assembly instructions (PTX) and can accurately predict changes in the execution time, power and energy consumption of applications when the frequencies of different GPU domains (core and memory) are scaled. Validated with 24 applications on GPUs from different NVIDIA microarchitectures (Turing, Volta, Pascal and Maxwell), the proposed models attain a significant accuracy. Particularly, the obtained power consumption scaling model provides an average error rate of 7.9% (Tesla T4), 6.7% (Titan V), 5.9% (Titan Xp) and 5.4% (GTX Titan X), which is comparable to state-of-the-art run-time counter-based models. When using the models to select the minimum-energy frequency configuration, significant energy savings can be attained: 8.0% (Tesla T4), 6.0% (Titan V), 29.0% (Titan Xp) and 11.5% (GTX Titan X).

show abstract

Section: B Dvfs Impact On Application Behaviormentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Pareto-optimal Solutionsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

GPU Static Modeling Using PTX and Deep Structured Learning

et al. 2019

View full text Add to dashboard Cite

show abstract

Fast selection of compiler optimizations using performance prediction with graph neural networks

Rosário

Silva

Zanella

et al. 2022

Concurrency and Computation

View full text Add to dashboard Cite

Tuning application performance on modern computing infrastructures involves choices in a vast design space as modern computing architectures can have several complex structures impacting performance. Moreover, different applications use these structures in different ways, leading to a challenging performance function. Consequently, it is hard for compilers or experts to find optimal compilation parameters for an application that maximizes such performance function. One approach to tackle this problem is to evaluate many possible optimization plans and select the best among them. However, executing an application to measure its performance for every plan can be very expensive. To tackle this problem, previous work has investigated the use of Machine Learning techniques to predict the performance of the applications without executing them quickly. In this work, we evaluate the use of graph neural networks (GNN) to make fast predictions without executing the application to guide the selection of good optimization sequences. We propose a GNN architecture to make such predictions. We train and test it using 30 thousand different compilation plans applied to 300 different applications, using ARM64 and LLVM IR code representations as input. Our results indicate that the control and data flow graph can then learn features from the control and data flow graph to outperform nongraph‐aware Machine Learning models. Our GNN architecture achieved 91% accuracy in our dataset compared to 79% when using a nongraph‐aware architecture–taking only 16ms to predict a given input. If the application been optimized took an average of 10 s to execute, and we evaluated 1000 optimization sequences, it would take almost 9 h to assess all pairs, but only 16 s with our GNN .

show abstract