Early Visibility Resolution for Removing Ineffectual Computations in the Graphics Pipeline

Anglada, Martí; Lucas, Enrique de; Parcerisa, Joan-Manuel; Aragón, Juan L.; González, Antonio

doi:10.1109/hpca.2019.00015

Cited by 12 publications

(7 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Visibility Rendering Order (VRO) [8] is a technique that sorts objects in a 3D scene based on the front-to-back order from the preceding frame. Another recent technique is Early Visibility Resolution (EVR) [10] which uses the farthest point for each tile in a frame to predict occluded primitives in the next frame, with the aim of processing those presumably occluded primitives as the final ones. Both VRO and EVR use information from the preceding frame to re-sort the order in which objects/primitives are processed in the current frame to increase the effectiveness of the Early Depth Test.…”

Section: Related Work On Visibility Determinationmentioning

confidence: 99%

“…In particular, fragments that appear behind others (for a given camera viewpoint) are not visible in the final scene. The solution to the visibility problem is not unique and multiple approaches can be found in the literature [8], [9], [10] being the so-called Depth Test (or Z-Test) [11], which performs a visibility test at the pixel granularity, the most widely implemented technique in contemporary GPUs.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Omega-Test: A Predictive Early-Z Culling to Improve the Graphics Pipeline Energy-Efficiency

Corbalan-Navarro

Aragón

Anglada

et al. 2022

IEEE Trans. Visual. Comput. Graphics

Self Cite

View full text Add to dashboard Cite

The most common task of GPUs is to render images in real time. When rendering a 3D scene, a key step is to determine which parts of every object are visible in the final image. There are different approaches to solve the visibility problem, the Z-Test being the most common. A main factor that significantly penalizes the energy efficiency of a GPU, especially in the mobile arena, is the so-called overdraw, which happens when a portion of an object is shaded and rendered but finally occluded by another object. This useless work results in a waste of energy; however, a conventional Z-Test only avoids a fraction of it. In this paper we present a novel microarchitectural technique, the Ω-Test, to drastically reduce the overdraw on a Tile-Based Rendering (TBR) architecture. Graphics applications have a great degree of inter-frame coherence, which makes the output of a frame very similar to the previous one. The proposed approach leverages the frame-to-frame coherence by using the resulting information of the Z-Test for a tile (a buffer containing all the calculated pixel depths for a tile), which is discarded by nowadays GPUs, to predict the visibility of the same tile in the next frame. As a result, the Ω-Test early identifies occluded parts of the scene and avoids the rendering of non-visible surfaces eliminating costly computations and off-chip memory accesses. Our experimental evaluation shows average EDP savings in the overall GPU/Memory system of 26.4% and an average speedup of 16.3% for the evaluated benchmarks.

show abstract

Section: Related Work On Visibility Determinationmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Omega-Test: A Predictive Early-Z Culling to Improve the Graphics Pipeline Energy-Efficiency

Corbalan-Navarro

Aragón

Anglada

et al. 2022

IEEE Trans. Visual. Comput. Graphics

Self Cite

View full text Add to dashboard Cite

show abstract

“…Dong et al [25] presented a crowd rendering system which integrates level-of-detail and visibility culling techniques for efficiently rendering an animated crowd. Anglada et al [4] proposed a method to estimate the visibility at two different levels for animated scenes. Koch and Wimmer [21] presented a visibility computation method by sampling locations and determining a potentially visible set of triangles using ray casting.…”

Section: Previous Workmentioning

confidence: 99%

“…The general idea of these visibility culling methods is to save the computational resources for processing geometric primitives that do not contribute to the final image by excluding those primitives at an early stage of the graphics pipeline [1,2]. Many visibility-culling methods use spatial and temporal coherence to improve the efficiency since adjacent geometric primitives or frames show similar visibilities [3][4][5]. Geometric clustering is also useful for visibility culling by processing adjacent primitives together [6][7][8][9][10][11].…”

Section: Introductionmentioning

confidence: 99%

Mesh Clustering and Reordering Based on Normal Locality for Efficient Rendering

Kim

Lee

2022

Symmetry

View full text Add to dashboard Cite

Recently, the size of models for real-time rendering has been significantly increasing for realism, and many graphics applications are being developed in mobile devices with relatively insufficient hardware power. Therefore, improving rendering speed is still important in graphics. Back-face culling is one of the core speed-up techniques to remove the back-facing polygons that are not drawn in the result image. In this paper, we present a mesh clustering and reordering method based on normal coherence for efficient back-face culling at an earlier stage than the current method, which removes back faces after the vertex shader on the GPU. In the pre-computation, our method first vertically clusters the mesh into multiple stripes based on the latitude of the face normal vector and sorts each stripe in ascending order of longitude. At runtime, our method computes a potentially visible set of faces at the current camera view by excluding back faces from the clustered and reordered faces list, and draws only the potentially visible set. Experiments have shown that the rendering using our method is more efficient than traditional methods, especially for large and static models.

show abstract

“…Graphics. While integrated GPUs from ARM and Nvidia have been gathered a number of research interests recently [1,7,11], it is challenging to extend the successful optimization tricks to Intel Graphics. To the best of our knowledge, there is virtually no published work that provides detailed insights regarding the methodology of performance optimization on Intel Graphics for various CNN models.…”

Section: Optimization Consideration On Intelmentioning

confidence: 99%

A Unified Optimization Approach for CNN Model Inference on Integrated GPUs

Wang

Chen

Liu

et al. 2019

Proceedings of the 48th International Conference on Parallel Processing

View full text Add to dashboard Cite

Modern deep learning applications urge to push the model inference taking place at the edge devices for multiple reasons such as achieving shorter latency, relieving the burden of the network connecting to the cloud, and protecting user privacy. The Convolutional Neural Network (CNN ) is one of the most widely used model family in the applications. Given the high computational complexity of the CNN models, it is favorable to execute them on the integrated GPUs at the edge devices, which are ubiquitous and have more power and better energy efficiency than the accompanying CPUs. However, programming on integrated GPUs efficiently is challenging due to the variety of their architectures and programming interfaces. This paper proposes an end-to-end solution to execute CNN model inference on the integrated GPUs at the edge, which uses a unified IR to represent and optimize vision-specific operators on integrated GPUs from multiple vendors, as well as leverages machine learning-based scheduling search schemes to optimize computationally-intensive operators like convolution. Our solution even provides a fallback mechanism for operators not suitable or convenient to run on GPUs. The evaluation results suggest that compared to state-of-the-art solutions backed up by the vendorprovided high-performance libraries on Intel Graphics, ARM Mali GPU, and Nvidia integrated Maxwell GPU, our solution achieves similar, or even better (up to 1.62×), performance on a number of popular image classification and object detection models. In addition, our solution has a wider model coverage and is more flexible to embrace new models. Our solution has been adopted in production services in AWS and is open-sourced.

show abstract

Early Visibility Resolution for Removing Ineffectual Computations in the Graphics Pipeline

Cited by 12 publications

References 11 publications

Omega-Test: A Predictive Early-Z Culling to Improve the Graphics Pipeline Energy-Efficiency

Omega-Test: A Predictive Early-Z Culling to Improve the Graphics Pipeline Energy-Efficiency

Mesh Clustering and Reordering Based on Normal Locality for Efficient Rendering

A Unified Optimization Approach for CNN Model Inference on Integrated GPUs

Contact Info

Product

Resources

About