2015
DOI: 10.1111/cgf.12666
|View full text |Cite
|
Sign up to set email alerts
|

Register Efficient Dynamic Memory Allocator for GPUs

Abstract: We compare five existing dynamic memory allocators optimized for GPUs and show their strengths and weaknesses. In the measurements, we use three generic evaluation tests proposed in the past and we add one with a real workload, where dynamic memory allocation is used in building the k-d tree data structure. Following the performance analysis we propose a new dynamic memory allocator and its variants that address the limitations of the existing dynamic memory allocators. The new dynamic memory allocator uses fe… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2015
2015
2020
2020

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 15 publications
(9 citation statements)
references
References 20 publications
0
9
0
Order By: Relevance
“…We have found out that for top–down build algorithms the times are by factor of 10 to 20 times higher for kd‐trees than for BVHs. This is due to the need for the dynamic memory allocation required for kd‐trees that is rather slow on a GPU, even if we use an optimized memory allocator for GPUs [VH14]. Moreover, the triangle splitting also increases the number of triangles that must be repeatedly sorted in the lower levels of the tree, thus, increasing the memory traffic.…”
Section: Resultsmentioning
confidence: 99%
“…We have found out that for top–down build algorithms the times are by factor of 10 to 20 times higher for kd‐trees than for BVHs. This is due to the need for the dynamic memory allocation required for kd‐trees that is rather slow on a GPU, even if we use an optimized memory allocator for GPUs [VH14]. Moreover, the triangle splitting also increases the number of triangles that must be repeatedly sorted in the lower levels of the tree, thus, increasing the memory traffic.…”
Section: Resultsmentioning
confidence: 99%
“…Due to the hash-based addressing of available memory pages, threads can minimize contention for the same block of memory and scatter their block assignments for efficient random access (with a possible tradeoff of memory fragmentation). Vinkler and Havran [111] survey and experimentally compare existing GPU dynamic memory allocation schemes. The performance of each scheme varies across different criteria, including fragmentation of available memory blocks, per-block thread contention for atomic allocation requests, size and coalescing of requested memory by inter-warp threads, uniformity of the number of allocation requests per inter-warp thread, and dependence on the number of user-specified registers available to threads in each SM of the GPU.…”
Section: Separate Chainingmentioning
confidence: 99%
“…Following the optimization guide that is distributed with the Quasar platform [35], our presented solution avoids the use of dynamic memory such as described in [36] and dynamic parallelism, and instead operates on fixed-size buffers and vectors. As it has been described in previous sections, the SMoE algorithm requires the inverse of the covariance matrix R j to compute each component weight, and again to compute the final pixel color reconstruction.…”
Section: Quasar Implementationmentioning
confidence: 99%