Potential benefits of a block-space GPU approach for discrete tetrahedral domains

Navarro, Cristóbal A.; Bustos, Benjamín; Hitschfeld, Nancy

doi:10.1109/clei.2016.7833394

Cited by 6 publications

(4 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Navarro et al [11,12,13] proposed a GPU block-space mapping for 2 and 3-simplex domains based on a linear numbering of discrete elements. Authors report an empirical speedup of up to 1.5× and 2.3× over a bounding-box approach, for 2 and 3-simplices, respectively.…”

Section: Gpu Processing In Complex Domainsmentioning

confidence: 99%

“…Thanks to the 40GB of GPU memory of the A100, it was possible to push the maximum problem size up to 2 16 , except for the curve of S 1x1 that could not reach the maximum size because of CUDA's grid size limits. The TITAN RTX under-performed in comparison with the other GPUs for n ≤ 2 12 . Past that value, speedup increases above 1 for all values of ρ, reaching a top speedup of ∼ 3.2×.…”

Section: Performance Plotsmentioning

confidence: 99%

See 1 more Smart Citation

Squeeze: Efficient Compact Fractals for Tensor Core GPUs

Quezada¹,

Navarro²,

Hitschfeld³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

This work presents Squeeze, an efficient compact fractal processing scheme for tensor core GPUs. By combining discrete-space transformations between compact and expanded forms, one can do data-parallel computation on a fractal with neighborhood access without needing to expand the fractal in memory. The space transformations are formulated as two GPU tensor-core accelerated thread maps, λ(ω) and ν(ω), which act as compact-to-expanded and expanded-to-compact space functions, respectively. The cost of the maps is O(log 2 log s (n)) time, with n being the side of a n × n embedding for the fractal in its expanded form, and s the linear scaling factor. The proposed approach works for any fractal that belongs to the Non-overlapping-Bounding-Boxes (NBB) class of discrete fractals, and can be extended to three dimensions as well. Experimental results using a discrete Sierpinski Triangle as a case study shows up to ∼ 12× of speedup and a memory reduction factor of up to ∼ 315× with respect to a GPU-based expanded-space bounding box approach. These results show that the proposed compact approach will allow the scientific community to efficiently tackle problems that up to now could not fit into GPU memory.

show abstract

Section: Gpu Processing In Complex Domainsmentioning

confidence: 99%

Section: Performance Plotsmentioning

confidence: 99%

Squeeze: Efficient Compact Fractals for Tensor Core GPUs

Quezada¹,

Navarro²,

Hitschfeld³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…Ries et al [15] developed in 2009 a method to compute the inverse of triangular matrices by developing a recursive parallel space mapping from a compact rectangular domain using GPU. Navarro et al proposed in 2014 a GPU block-space mapping for 2-simplex and 3-simplex shaped data [9,8,10]. Navarro et al [7,11] expanded the idea of GPU thread mapping for fractal domains, by proposing the λ(ω) map for NBB fractals.…”

Section: Related Workmentioning

confidence: 99%

“…Here, n ∈ N is the linear size of the fractal along one axis, k ∈ N the number of self-similar replicas to generate for the next scale level and s ∈ N the growth ratio of n in the next scale level, along an axis. For example, the Sierpiński Carpet (Figure 1) is F 8,3 n and the Sierpiński Triangle (Figure 2) is F 3,2 n . Many different NBB fractals can be described using the same parameters.…”

mentioning

confidence: 99%

Accelerating Compact Fractals with Tensor Core GPUs

Quezada¹,

Navarro²

2021

Preprint

View full text Add to dashboard Cite

This work presents a GPU thread mapping approach that allows doing fast parallel stencil-like computations on discrete fractals using their compact representation. The intuition behind is to employ two GPU tensor-core accelerated thread maps, λ(ω) and ν(ω), which act as threadspace-to-dataspace and dataspace-to-threadspace functions, respectively. By combining these maps, threads can access compact space and interact with their neighbors. The cost of the maps is O(log log(n)) time, with n being the side of a n × n embedding for the fractal in its expanded form. The technique works on any fractal that belongs to the Non-overlapping-Bounding-Boxes (NBB) class of discrete fractals, and can be extended to three dimensions as well. Results using an A100 GPU on the Sierpinski Triangle as a case study show up to ∼ 11× of speedup and a memory usage reduction of 234× with respect to a Bounding Box approach. These results show that the proposed compact approach can allow the scientific community to tackle larger problems that did not fit in GPU memory before, and run even faster than a bounding box approach.

show abstract