An Energy-Efficient Unified Register File for Mobile GPUs

Chu, Slo‐Li; Hsiao, Chih-Chieh; Hsieh, Chiu-Cheng

doi:10.1109/euc.2011.15

Cited by 18 publications

(11 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…(1) DVFS (dynamic voltage/frequency scaling) based techniques Jiao et al 2010;Ma et al 2012;Cebrian et al 2012;Lee et al 2011;Sheaffer et al 2005b;Chang et al 2008;Ren 2011;Anzt et al 2011;Ren et al 2012;Zhao et al 2012;Huo et al 2012;Keller and Gruber 2010;Abe et al 2012;Park et al 2006;Paul et al 2013] (2) CPU-GPU workload division based techniques [Takizawa et al 2008;Rofouei et al 2008;Ma et al 2012;Hamano et al 2009] and GPU workload consolidation (3) Architectural techniques for saving energy in specific GPU components, such as caches Lee et al 2011;Lashgar et al 2013;Arnau et al 2012;Rogers et al 2013;Lee and Kim 2012], global memory [Wang et al 2013;Rhu et al 2013], pixel shader [Pool et al 2011], vertex shader [Pool et al 2008], core data-path, registers, pipeline and thread-scheduling Chu et al 2011;Gebhart et al 2011;Jing et al 2013;Gilani et al 2012;Sethia et al 2013].…”

Section: Overviewmentioning

confidence: 99%

A Survey of Methods for Analyzing and Improving GPU Energy Efficiency

2014

View full text Add to dashboard Cite

Recent years have witnessed phenomenal growth in the computational capabilities and applications of GPUs. However, this trend has also led to a dramatic increase in their power consumption. This article surveys research works on analyzing and improving energy efficiency of GPUs. It also provides a classification of these techniques on the basis of their main research idea. Further, it attempts to synthesize research works that compare the energy efficiency of GPUs with other computing systems (e.g., FPGAs and CPUs). The aim of this survey is to provide researchers with knowledge of the state of the art in GPU power management and motivate them to architect highly energy-efficient GPUs of tomorrow. ACM Reference Format:Sparsh Mittal and Jeffrey S. Vetter. 2014. A survey of methods for analyzing and improving GPU energy efficiency.

show abstract

Section: Overviewmentioning

confidence: 99%

A Survey of Methods for Analyzing and Improving GPU Energy Efficiency

2014

View full text Add to dashboard Cite

show abstract

“…Further experimental data with the same SoC shows a peak consumption of the GPU 50% higher than the peak consumption of the CPU [5]. The current trend towards more realistic graphics and therefore, more power hungry applications [6] is just aggravating this issue, so, improving the energy efficiency of mobile GPUs is key for future designs [7], [8], [9], [10], [11], [12], [13], [14], [15]. The development of energy-efficient solutions is a requirement to make possible a richer user experience in these platforms.…”

Section: Introductionmentioning

confidence: 99%

Visibility Rendering Order: Improving Energy Efficiency on Mobile GPUs through Frame Coherence

Lucas

Marcuello

Parcerisa

et al. 2019

IEEE Trans. Parallel Distrib. Syst.

View full text Add to dashboard Cite

During real-time graphics rendering, objects are processed by the GPU in the order they are submitted by the CPU, and occluded surfaces are often processed even though they will end up not being part of the final image, thus wasting precious time and energy. To help discard occluded surfaces, most current GPUs include an Early-Depth test before the fragment processing stage. However, to be effective it requires that opaque objects are processed in a front-to-back order. Depth sorting and other occlusion culling techniques at the object level incur overheads that are only offset for applications having substantial depth and/or fragment shading complexity, which is often not the case in mobile workloads. We propose a novel architectural technique for GPUs, Visibility Rendering Order (VRO), which reorders objects front-to-back entirely in hardware by exploiting the fact that the objects in graphics animated applications tend to keep its relative depth order across consecutive frames (temporal coherence). Since order relationships are already tested by the Depth Test, VRO incurs minimal energy overheads because it just requires adding a small hardware to capture that information and use it later to guide the rendering of the following frame. Moreover, unlike other approaches, this unit works in parallel with the graphics pipeline without any performance overhead. We illustrate the benefits of VRO using various unmodified commercial 3D applications for which VRO achieves 27% speed-up and 15.8% energy reduction on average over a state-of-the-art mobile GPU.

show abstract

“…While games of those characteristics usually do not involve complex scenes and cutting-edge effects, rendering their scenes still requires a substantial amount of power, a limited resource in batteryoperated devices. Consequently, reducing the energy consumption of the GPU is a major concern of hardware and software designers [5], [6], [7], [8], [9]. Figure 1 shows the average power consumption and GPU load for the Android desktop (without animations), for several commercial Android games and the Antutu benchmark [11], divided into the CPU phase and the GPU phase (Antutu3D).…”

Section: Introductionmentioning

confidence: 99%

Rendering Elimination: Early Discard of Redundant Tiles in the Graphics Pipeline

Anglada

Lucas²,

Parcerisa

et al. 2019

2019 IEEE International Symposium on High Performance Computer Architecture (HPCA)

View full text Add to dashboard Cite

GPUs are one of the most energy-consuming components for real-time rendering applications, since a large number of fragment shading computations and memory accesses are involved. Main memory bandwidth is especially taxing batteryoperated devices such as smartphones. Tile-Based Rendering GPUs divide the screen space into multiple tiles that are independently rendered in on-chip buffers, thus reducing memory bandwidth and energy consumption. We have observed that, in many animated graphics workloads, a large number of screen tiles have the same color across adjacent frames. In this paper, we propose Rendering Elimination (RE), a novel micro-architectural technique that accurately determines if a tile will be identical to the same tile in the preceding frame before rasterization by means of comparing signatures. Since RE identifies redundant tiles early in the graphics pipeline, it completely avoids the computation and memory accesses of the most power consuming stages of the pipeline, which substantially reduces the execution time and the energy consumption of the GPU. For widely used Android applications, we show that RE achieves an average speedup of 1.74x and energy reduction of 43% for the GPU/Memory system, surpassing by far the benefits of Transaction Elimination, a stateof-the-art memory bandwidth reduction technique available in some commercial Tile-Based Rendering GPUs.

show abstract

An Energy-Efficient Unified Register File for Mobile GPUs

Cited by 18 publications

References 10 publications

A Survey of Methods for Analyzing and Improving GPU Energy Efficiency

A Survey of Methods for Analyzing and Improving GPU Energy Efficiency

Visibility Rendering Order: Improving Energy Efficiency on Mobile GPUs through Frame Coherence

Rendering Elimination: Early Discard of Redundant Tiles in the Graphics Pipeline

Contact Info

Product

Resources

About