Scientific Computing With Multicore and Accelerators 2010
DOI: 10.1201/b10376-29
|View full text |Cite
|
Sign up to set email alerts
|

Pairwise Computations on the Cell Processor with Applications in Computational Biology

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
57
0

Year Published

2012
2012
2019
2019

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 50 publications
(57 citation statements)
references
References 2 publications
0
57
0
Order By: Relevance
“…As we can see in Figure 5, the total memory requirement is high due to temporary requirements in the segmented scan [18] operation on the earlier GPU. The extra memory required is of the size 3 × nnz × 81 × 4 bytes which is used to store the data, flag and the final output arrays for the segmented scan operation.…”
Section: Memory Requirementsmentioning
confidence: 97%
See 1 more Smart Citation
“…As we can see in Figure 5, the total memory requirement is high due to temporary requirements in the segmented scan [18] operation on the earlier GPU. The extra memory required is of the size 3 × nnz × 81 × 4 bytes which is used to store the data, flag and the final output arrays for the segmented scan operation.…”
Section: Memory Requirementsmentioning
confidence: 97%
“…The summation is faster when using a segmented scan [18] on Tesla S1070 whereas a shared memory reduction is faster on the Fermi GPU. The memory space required to store U is cnp×cnp×m×4 bytes.…”
Section: Computation Of Umentioning
confidence: 99%
“…According to Laws of Large Number: 12) where N is the number of chains, θ * r is the value of θ * in the r-th chain and ε is the probable error.…”
Section: Gpu-based Monte Carlo Methods For Solving Linear Algebraic Ementioning
confidence: 99%
“…Exploitation of warps is recommended in the CUDA programming guide [16], and efficient algorithms have been developed that depend on this feature: Sengupta et al show the number of barrier synchronization operations required during a parallel scan can be reduced from log 2 (N ) to log 32 (N ), where N is the number of threads, by first scanning within warps, using implicit synchronization, and then aggregating across warps [17]. The OpenCL programming model aims to be general purpose and thus does not acknowledge the existence of warps, so relying on platform-specific warp behavior leads to non-portable code.…”
Section: Gpu Kernel Programming Modelmentioning
confidence: 99%