Register Efficient Dynamic Memory Allocator for GPUs

Vinkler, Marek; Havran, Vlastimil

doi:10.1111/cgf.12666

Cited by 15 publications

(9 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We have found out that for top–down build algorithms the times are by factor of 10 to 20 times higher for kd‐trees than for BVHs. This is due to the need for the dynamic memory allocation required for kd‐trees that is rather slow on a GPU, even if we use an optimized memory allocator for GPUs [VH14]. Moreover, the triangle splitting also increases the number of triangles that must be repeatedly sorted in the lower levels of the tree, thus, increasing the memory traffic.…”

Section: Resultsmentioning

confidence: 99%

Performance Comparison of Bounding Volume Hierarchies and Kd‐Trees for GPU Ray Tracing

Vinkler

Havran

Bittner

2015

Computer Graphics Forum

View full text Add to dashboard Cite

We present a performance comparison of bounding volume hierarchies and kd-trees for ray tracing on many-core architectures (GPUs). The comparison is focused on rendering times and traversal characteristics on the GPU using data structures that were optimized for very high performance of tracing rays. To achieve low rendering times, we extensively examine the constants used in termination criteria for the two data structures. We show that for a contemporary GPU architecture (NVIDIA Kepler) bounding volume hierarchies have higher ray tracing performance than kd-trees for simple and moderately complex scenes. On the other hand, kd-trees have higher performance for complex scenes, in particular for those with high depth complexity. Finally, we analyse the causes of the performance discrepancies using the profiling characteristics of the ray tracing kernels.

show abstract

Section: Resultsmentioning

confidence: 99%

Performance Comparison of Bounding Volume Hierarchies and Kd‐Trees for GPU Ray Tracing

Vinkler

Havran

Bittner

2015

Computer Graphics Forum

View full text Add to dashboard Cite

show abstract

“…Due to the hash-based addressing of available memory pages, threads can minimize contention for the same block of memory and scatter their block assignments for efficient random access (with a possible tradeoff of memory fragmentation). Vinkler and Havran [111] survey and experimentally compare existing GPU dynamic memory allocation schemes. The performance of each scheme varies across different criteria, including fragmentation of available memory blocks, per-block thread contention for atomic allocation requests, size and coalescing of requested memory by inter-warp threads, uniformity of the number of allocation requests per inter-warp thread, and dependence on the number of user-specified registers available to threads in each SM of the GPU.…”

Section: Separate Chainingmentioning

confidence: 99%

Data-Parallel Hashing Techniques for GPU Architectures

Lessley

Childs

2020

IEEE Trans. Parallel Distrib. Syst.

View full text Add to dashboard Cite

Hash tables are one of the most fundamental data structures for effectively storing and accessing sparse data, with widespread usage in domains ranging from computer graphics to machine learning. This study surveys the stateof-the-art research on data-parallel hashing techniques for emerging massively-parallel, many-core GPU architectures. Key factors affecting the performance of different hashing schemes are discovered and used to suggest best practices and pinpoint areas for further research.

show abstract

“…Following the optimization guide that is distributed with the Quasar platform [35], our presented solution avoids the use of dynamic memory such as described in [36] and dynamic parallelism, and instead operates on fixed-size buffers and vectors. As it has been described in previous sections, the SMoE algorithm requires the inverse of the covariance matrix R j to compute each component weight, and again to compute the final pixel color reconstruction.…”

Section: Quasar Implementationmentioning

confidence: 99%

Highly parallel steered mixture-of-experts rendering at pixel-level for image and light field data

Avramelos

Verhack

Saenen

et al. 2018

J Real-Time Image Proc

View full text Add to dashboard Cite

A novel image approximation framework called Steered Mixture-of-Experts (SMoE) was recently presented. SMoE has multiple applications in coding, scale-conversion, and general processing of image modalities. In particular, it has strong potential for coding and streaming higher dimensional image modalities that are necessary to leverage full translational and rotational freedom (6 Degrees-of-Freedom) in virtual reality for camera captured images. In this paper, we analyze the rendering performance of SMoE for 2D images and 4D light fields. Two different GPU implementations that parallelize the SMoE regression step at pixel-level are presented, including experimental evaluations based on rendering performance and quality. In this paper it is shown that on appropriate hardware, the OpenCL implementation can achieve 85fps and 22fps for respectively 1080p and 4K renderings of large models with more than 100.000 of Gaussian kernels.

show abstract

Register Efficient Dynamic Memory Allocator for GPUs

Cited by 15 publications

References 20 publications

Performance Comparison of Bounding Volume Hierarchies and Kd‐Trees for GPU Ray Tracing

Performance Comparison of Bounding Volume Hierarchies and Kd‐Trees for GPU Ray Tracing

Data-Parallel Hashing Techniques for GPU Architectures

Highly parallel steered mixture-of-experts rendering at pixel-level for image and light field data

Contact Info

Product

Resources

About