Pairwise Computations on the Cell Processor with Applications in Computational Biology

Sarje, Abhinav; Żola, Jarosław; Aluru, Srinivas

doi:10.1201/b10376-29

Cited by 50 publications

(57 citation statements)

References 2 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As we can see in Figure 5, the total memory requirement is high due to temporary requirements in the segmented scan [18] operation on the earlier GPU. The extra memory required is of the size 3 × nnz × 81 × 4 bytes which is used to store the data, flag and the final output arrays for the segmented scan operation.…”

Section: Memory Requirementsmentioning

confidence: 97%

See 1 more Smart Citation

Practical Time Bundle Adjustment for 3D Reconstruction on the GPU

Choudhary

Gupta

Narayanan

2012

Trends and Topics in Computer Vision

View full text Add to dashboard Cite

Abstract. Large-scale 3D reconstruction has received a lot of attention recently. Bundle adjustment is a key component of the reconstruction pipeline and often its slowest and most computational resource intensive. It hasn't been parallelized effectively so far. In this paper, we present a hybrid implementation of sparse bundle adjustment on the GPU using CUDA, with the CPU working in parallel. The algorithm is decomposed into smaller steps, each of which is scheduled on the GPU or the CPU. We develop efficient kernels for the steps and make use of existing libraries for several steps. Our implementation outperforms the CPU implementation significantly, achieving a speedup of 30-40 times over the standard CPU implementation for datasets with upto 500 images on an Nvidia Tesla C2050 GPU.

show abstract

Section: Memory Requirementsmentioning

confidence: 97%

“…The summation is faster when using a segmented scan [18] on Tesla S1070 whereas a shared memory reduction is faster on the Fermi GPU. The memory space required to store U is cnp×cnp×m×4 bytes.…”

Section: Computation Of Umentioning

confidence: 99%

Practical Time Bundle Adjustment for 3D Reconstruction on the GPU

Choudhary

Gupta

Narayanan

2012

Trends and Topics in Computer Vision

View full text Add to dashboard Cite

show abstract

“…According to Laws of Large Number: 12) where N is the number of chains, θ * r is the value of θ * in the r-th chain and ε is the probable error.…”

Section: Gpu-based Monte Carlo Methods For Solving Linear Algebraic Ementioning

confidence: 99%

GPU-Based Monte Carlo Methods for Solving Linear Algebraic Equations

Lai¹

2016

Proceedings of the Fourth International Conference on Information Science and Cloud Computing — PoS(ISCC2015)

View full text Add to dashboard Cite

Many engineering, physics, chemistry, and computer science problems involve solving systems of linear algebraic equations (SLAE). It is an important issue in scientific computing field. In this paper we study the Monte Carlo methods (MCMs) for solving SLAE. We take the advantage of Graphic Processor Unit (GPU) to accelerate MCMs for solving SLAE. The result of numerical experiments demonstrates that GPU is very suitable for speeding up this application. Moreover, the accelerated ratio can be up 50X with the problem size increasing. 2015 18-19, December, ISCC

show abstract

“…Exploitation of warps is recommended in the CUDA programming guide [16], and efficient algorithms have been developed that depend on this feature: Sengupta et al show the number of barrier synchronization operations required during a parallel scan can be reduced from log 2 (N ) to log 32 (N ), where N is the number of threads, by first scanning within warps, using implicit synchronization, and then aggregating across warps [17]. The OpenCL programming model aims to be general purpose and thus does not acknowledge the existence of warps, so relying on platform-specific warp behavior leads to non-portable code.…”

Section: Gpu Kernel Programming Modelmentioning

confidence: 99%

Warps and Atomics: Beyond Barrier Synchronization in the Verification of GPU Kernels

Bardsley

Donaldson

2014

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. We describe the design and implementation of methods to support reasoning about data races in GPU kernels where constructs other than the standard barrier primitive are used for synchronization. At one extreme we consider kernels that exploit implicit, coarse-grained synchronization between threads in the same warp, a feature provided by many architectures. At the other extreme we consider kernels that reduce or avoid barrier synchronization through the use of atomic operations. We discuss design decisions associated with providing support for warps and atomics in GPUVerify, a formal verification tool for OpenCL and CUDA kernels. We evaluate the practical impact of these design decisions using a large set of benchmarks, showing that warps can be supported in a scalable manner, that a coarse abstraction suffices for efficient reasoning about most practical uses of atomic operations, and that a novel, refined abstraction captures an important design pattern where atomic operations are used to compute unique array indices. Our evaluation revealed two previously unknown bugs in publicly available benchmark suites.

show abstract

Pairwise Computations on the Cell Processor with Applications in Computational Biology

Cited by 50 publications

References 2 publications

Practical Time Bundle Adjustment for 3D Reconstruction on the GPU

Practical Time Bundle Adjustment for 3D Reconstruction on the GPU

GPU-Based Monte Carlo Methods for Solving Linear Algebraic Equations

Warps and Atomics: Beyond Barrier Synchronization in the Verification of GPU Kernels

Contact Info

Product

Resources

About