a) Shading rate for XYZRGB Dragon (b) Rendering a scene from The Witcher 3 (c) Computation of inner and outer mesh envelopes Figure 1: Reducing the number of shader invocation during rendering is essential to guarantee high performance. Traditionally, redundant vertex shading can be bypassed using a post-transform cache, but its poor scalability makes the vertex cache a poor choice in massively parallel environments. (a) The batch-based approaches we explore in this work show good reuse characteristics on modern GPUs (green vertices are shaded only once, dark red six times). (b, c) We evaluate static and dynamic batching in a variety of applications, e.g., rasterization of captured game scenes and computation of mesh simplification envelopes. The Witcher 3: Wild Hunt screenshot courtesy of CD PROJEKT S.A.; used with permission.
ABSTRACTCompute-mode rendering is becoming more and more attractive for non-standard rendering applications, due to the high flexibility of compute-mode execution. These newly designed pipelines often include streaming vertex and geometry processing stages. In typical triangle meshes, the same transformed vertex is on average required six times during rendering. To avoid redundant computation, a post-transform cache is traditionally suggested to enable reuse of vertex processing results. However, traditional caching neither scales well as the hardware becomes more parallel, nor can be efficiently implemented in a software design. We investigate alternative strategies to reusing vertex shading results on-the-fly for massively parallel software geometry processing. Forming static and dynamic batching on the data input stream, we analyze the effectiveness of identifying potential local reuse based on sorting, hashing, and efficient intra-thread-group communication. Altogether, we present four vertex reuse strategies, tailored to modern parallel architectures. Our simulations showcase that our batchbased strategies significantly outperform parallel caches in terms