Piko

Patney, Anjul; Tzeng, Stanley; Seitz, Kerry A.; Owens, John D.

doi:10.1145/2766973

Cited by 15 publications

(1 citation statement)

References 37 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A more current area of research where we encounter the problem of massively parallel vertex processing is software rendering on the modern GPU. Noteworthy examples of GPU software rendering pipelines include Freepipe [Liu et al 2010], CUDARaster [Laine and Karras 2011], and Piko [Patney et al 2015]. They all use the compute mode of the GPU (typically on top of the CUDA [NVIDIA 2016] ecosystem) to implement rasterization and fragment shading, but lack mechanisms for vertex reuse.…”

Section: Related Workmentioning

confidence: 99%

On-the-fly Vertex Reuse for Massively-Parallel Software Geometry Processing

Kenzel

Kerbl

Tatzgern

et al. 2018

Proc. ACM Comput. Graph. Interact. Tech.

View full text Add to dashboard Cite

a) Shading rate for XYZRGB Dragon (b) Rendering a scene from The Witcher 3 (c) Computation of inner and outer mesh envelopes Figure 1: Reducing the number of shader invocation during rendering is essential to guarantee high performance. Traditionally, redundant vertex shading can be bypassed using a post-transform cache, but its poor scalability makes the vertex cache a poor choice in massively parallel environments. (a) The batch-based approaches we explore in this work show good reuse characteristics on modern GPUs (green vertices are shaded only once, dark red six times). (b, c) We evaluate static and dynamic batching in a variety of applications, e.g., rasterization of captured game scenes and computation of mesh simplification envelopes. The Witcher 3: Wild Hunt screenshot courtesy of CD PROJEKT S.A.; used with permission. ABSTRACTCompute-mode rendering is becoming more and more attractive for non-standard rendering applications, due to the high flexibility of compute-mode execution. These newly designed pipelines often include streaming vertex and geometry processing stages. In typical triangle meshes, the same transformed vertex is on average required six times during rendering. To avoid redundant computation, a post-transform cache is traditionally suggested to enable reuse of vertex processing results. However, traditional caching neither scales well as the hardware becomes more parallel, nor can be efficiently implemented in a software design. We investigate alternative strategies to reusing vertex shading results on-the-fly for massively parallel software geometry processing. Forming static and dynamic batching on the data input stream, we analyze the effectiveness of identifying potential local reuse based on sorting, hashing, and efficient intra-thread-group communication. Altogether, we present four vertex reuse strategies, tailored to modern parallel architectures. Our simulations showcase that our batchbased strategies significantly outperform parallel caches in terms

show abstract

Section: Related Workmentioning

confidence: 99%

On-the-fly Vertex Reuse for Massively-Parallel Software Geometry Processing

Kenzel

Kerbl

Tatzgern

et al. 2018

Proc. ACM Comput. Graph. Interact. Tech.

View full text Add to dashboard Cite

show abstract

A 3D graphics rendering pipeline implementation based on the openCL massively parallel processing

Kim

Baek

2021

J Supercomput

View full text Add to dashboard Cite

A Cross-platform Evaluation of Graphics Shader Compiler Optimization

Crawford

O’Boyle

2018

2018 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)

View full text Add to dashboard Cite

For real-time graphics applications such as games and virtual reality, performance is crucial to provide a smooth user experience. Central to this is the performance of shader programs which render images on the GPU. The rise of low-level graphics APIs such as Vulkan means compilation tools play an increasingly important role in the graphics ecosystem. However, despite the importance of graphics, there is little published work on the impact of compiler optimization. This paper explores common features of graphics shaders, and examines the impact and applicability of common optimizations such as loop unrolling, and arithmetic reassociation. Combinations of optimizations are evaluated via exhaustive search across a wide set of shaders from the GFXBench 4.0 benchmark suite. Their impact is assessed across three desktop and two mobile GPUs from different vendors. We show that compiler optimization can have significant positive and negative impacts which vary across optimisations, benchmarks and platforms.

show abstract

Piko

Cited by 15 publications

References 37 publications

On-the-fly Vertex Reuse for Massively-Parallel Software Geometry Processing

On-the-fly Vertex Reuse for Massively-Parallel Software Geometry Processing

A 3D graphics rendering pipeline implementation based on the openCL massively parallel processing

A Cross-platform Evaluation of Graphics Shader Compiler Optimization

Contact Info

Product

Resources

About