20th Annual International Conference on High Performance Computing 2013
DOI: 10.1109/hipc.2013.6799120
|View full text |Cite
|
Sign up to set email alerts
|

A memory efficient algorithm for adaptive multidimensional integration with multiple GPUs

Abstract: Abstract-We present a memory-efficient algorithm and its implementation for solving multidimensional numerical integration on a cluster of compute nodes with multiple GPU devices per node. The effective use of shared memory is important for improving the performance on GPUs, because of the bandwidth limitation of the global memory. The best known sequential algorithm for multidimensional numerical integration CUHRE uses a large dynamic heap data structure which is accessed frequently. Devising a GPU algorithm … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
7
0

Year Published

2015
2015
2023
2023

Publication Types

Select...
3
3
1

Relationship

2
5

Authors

Journals

citations
Cited by 8 publications
(7 citation statements)
references
References 12 publications
0
7
0
Order By: Relevance
“…In the works of Arumugam et al, a parallel algorithm with a deterministic adaptive strategy for the multidimensional integration on GPUs only is presented. The authors focus their attention to the optimization techniques to implement a two‐step procedure: in the first step the algorithm generates a list of sub‐domains of the integration region that are then processed in the second step by using the GPU.…”
Section: Related Workmentioning
confidence: 99%
“…In the works of Arumugam et al, a parallel algorithm with a deterministic adaptive strategy for the multidimensional integration on GPUs only is presented. The authors focus their attention to the optimization techniques to implement a two‐step procedure: in the first step the algorithm generates a list of sub‐domains of the integration region that are then processed in the second step by using the GPU.…”
Section: Related Workmentioning
confidence: 99%
“…n , along the outer dimension. Adaptive quadrature is traditionally used to compute such partitions, which, as illustrated in [3,4], is characterized by control-flow and memory access irregularities that leads to severe performance bottlenecks on GPU architectures.…”
Section: Forecasting Control-flowmentioning
confidence: 99%
“…Once the partition is computed, integral estimate is calculated using Equation 14. However, in our proposed approach, a single unique partition per class that combines the partition of rp-integral at all grid points of that particular class is calculated using heuristics instead of using traditional adaptive quadrature methods on each point.The main motivations for calculating such unique partition for a group of points instead of individual grid-point is that it eliminates the need for adaptive quadrature or data-dependent control-flow on each integral evaluation, which, as illustrated in[3,4,5], is the main performance bottleneck for such adaptive computations on SIMD architectures. The procedure RP-IntegralPartition implements this heuristics approach, where for each class c ∈ C, it generates a unique partition P [1..P.length] that denotes a rp-integral partition along the outer integration domain (r -domain).Ideally, P should be a combination of the partitions generated by rp-integral at all p ∈ c. However, computing such partition per class instead of individual grid-point is computationally challenging due to the data-dependent, and irregular control-flow behavior of different rp-integrals.…”
mentioning
confidence: 99%
“…Unfortunately, this approach is infeasible for higher dimensions as 𝑑 𝑛 grows exponentially with 𝑛. For example if 𝑛 = 10 and we need to split each dimension into 𝑑 = 20 parts the number of sub-regions created would be 20 10 which is roughly 10 13 . Moreover, uniform division of the integration region is not the best way to estimate the integral.…”
Section: Introductionmentioning
confidence: 99%
“…We propose a new deterministic, parallel adaptive algorithm for multi-dimensional integration for massively parallel architectures. It is inspired by the Cuhre method of the Cuba library first introduced in [2] and its parallel GPU-adaptation [6] [10]. Unlike other parallel methods such as [6] [10], the proposed PAGANI algorithm does not utilize the common sequential scheme seen in adaptive integration.…”
Section: Introductionmentioning
confidence: 99%