Abstract-This paper presents an algorithm for fast sorting of large lists using modern GPUs. The method achieves high speed by efficiently utilizing the parallelism of the GPU throughout the whole algorithm. Initially, a parallel bucketsort splits the list into enough sublists then to be sorted in parallel using merge-sort. The parallel bucketsort, implemented in NVIDIA's CUDA, utilizes the synchronization mechanisms, such as atomic increment, that is available on modern GPUs. The mergesort requires scattered writing, which is exposed by CUDA and ATI's Data Parallel Virtual Machine[1]. For lists with more than 512k elements, the algorithm performs better than the bitonic sort algorithms, which have been considered to be the fastest for GPU sorting, and is more than twice as fast for 8M elements. It is 6-14 times faster than single CPU quicksort for 1-8M elements respectively. In addition, the new GPU-algorithm sorts on n log n time as opposed to the standard n(log n) 2 for bitonic sort. Recently, it was shown how to implement GPU-based radix-sort, of complexity n log n, to outperform bitonic sort. That algorithm is, however, still up to ∼ 40% slower for 8M elements than the hybrid algorithm presented in this paper. GPU-sorting is memory bound and a key to the high performance is that the mergesort works on groups of four-float values to lower the number of memory fetches. Finally, we demonstrate the performance on sorting vertex distances for two large 3D-models; a key in for instance achieving correct transparency.
Figure 1: The EPICCITADEL scene voxelized to a 128K 3 (131 072 3 ) resolution and stored as a Sparse Voxel DAG. Total voxel count is 19 billion, which requires 945MB of GPU memory. A sparse voxel octree would require 5.1GB without counting pointers. Primary shading is from triangle rasterization, while ambient occlusion and shadows are raytraced in the sparse voxel DAG at 170 MRays/sec and 240 MRays/sec respectively, on an NVIDIA GTX680. AbstractWe show that a binary voxel grid can be represented orders of magnitude more efficiently than using a sparse voxel octree (SVO) by generalising the tree to a directed acyclic graph (DAG). While the SVO allows for efficient encoding of empty regions of space, the DAG additionally allows for efficient encoding of identical regions of space, as nodes are allowed to share pointers to identical subtrees. We present an efficient bottom-up algorithm that reduces an SVO to a minimal DAG, which can be applied even in cases where the complete SVO would not fit in memory. In all tested scenes, even the highly irregular ones, the number of nodes is reduced by one to three orders of magnitude. While the DAG requires more pointers per node, the memory cost for these is quickly amortized and the memory consumption of the DAG is considerably smaller, even when compared to an ideal SVO without pointers. Meanwhile, our sparse voxel DAG requires no decompression and can be traversed very efficiently. We demonstrate this by ray tracing hard and soft shadows, ambient occlusion, and primary rays in extremely high resolution DAGs at speeds that are on par with, or even faster than, state-of-the-art voxel and triangle GPU ray tracing.
This paper introduces an accurate real-time soft shadow algorithm that uses sample based visibility. Initially, we present a GPU-based alias-free hard shadow map algorithm that typically requires only a single render pass from the light, in contrast to using depth peeling and one pass per layer. For closed objects, we also suppress the need for a bias. The method is extended to soft shadow sampling for an arbitrarily shaped area-/volumetric light source using 128-1024 light samples per screen pixel. The alias-free shadow map guarantees that the visibility is accurately sampled per screen-space pixel, even for arbitrarily shaped (e.g. non-planar) surfaces or solid objects. Another contribution is a smooth coherent shading model to avoid common light leakage near shadow borders due to normal interpolation.
We explore the problem of decoupling color information from geometry in large scenes of voxelized surfaces and of compressing the array of colors without introducing disturbing artifacts. In this extension of our I3D paper with the same title, we first present a novel method for connecting each node in a sparse voxel DAG to its corresponding colors in a separate 1D array of colors, with very little additional information stored to the DAG. Then, we show that by mapping the 1D array of colors onto a 2D image using a space-filling curve, we can achieve high compression rates and good quality using conventional, modern, hardware-accelerated texture compression formats such as ASTC or BC7. We additionally explore whether this method can be used to compress voxel colors for off-line storage and network transmission using conventional off-line compression formats such as JPG and JPG2K. For real-time decompression, we suggest a novel variable bitrate block encoding that consistently outperforms previous work, often achieving two times the compression at equal quality.
Figure 1: The woman renders in 37.3 fps using 20k hair strands (300k line segments). The dog renders in 17.2 fps using 400k hair strands (2M line segments). AbstractThis paper presents a method for quickly constructing a highquality approximate visibility function for high frequency semitransparent geometry such as hair. We can then reconstruct the visibility for any fragment without the expensive compression needed by Deep Shadow Maps and with a quality that is much better than what is attainable at similar framerates using Opacity Maps or Deep Opacity Maps. The memory footprint of our method is also considerably lower than that of previous methods. We then use a similar method to achieve back-to-front sorted alpha blending of the fragments with results that are virtually indistinguishable from depthpeeling and an order of magnitude faster.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.