2012
DOI: 10.1016/j.parco.2012.07.002
|View full text |Cite
|
Sign up to set email alerts
|

Analysis and performance estimation of the Conjugate Gradient method on multiple GPUs

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

1
23
0

Year Published

2013
2013
2018
2018

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 37 publications
(24 citation statements)
references
References 19 publications
1
23
0
Order By: Relevance
“…Comparing their implementation against the one by Buatois et al, they could achieve 3.7× higher average performance for the total set of matrices, although the individual performance was worse for 33% of the test cases. Neglecting the three test cases of extremely poor performance mentioned above, the advantage of [17] reached a notable 6.1× over Buatois et al implementation.…”
Section: Related Workmentioning
confidence: 82%
See 2 more Smart Citations
“…Comparing their implementation against the one by Buatois et al, they could achieve 3.7× higher average performance for the total set of matrices, although the individual performance was worse for 33% of the test cases. Neglecting the three test cases of extremely poor performance mentioned above, the advantage of [17] reached a notable 6.1× over Buatois et al implementation.…”
Section: Related Workmentioning
confidence: 82%
“…Verschoor and Jalba [17] also aimed at improving the spmv using the BCSR format, in their case by analyzing the effect of certain reorderings of the blocks. These authors evaluated the total speed-up, considering the average execution time for all the test cases, and reported that their implementation was 1.25× faster on average than Bell and Garland's hybrid format-based implementation.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…[57,56]: In [56], a sparse CGM is implemented on the GPU. An analytical model is presented which is used for optimising two implementation parameters: the number of threads and the size of the CUDA blocks.…”
Section: Discussionmentioning
confidence: 99%
“…Parameters of the model are the warp size and the number of streaming processors of the GPU, which are machine-specific, as well as the length of each matrix row, which is data-specific. In [57] a model for executing a parallel CGM on multiple GPUs is set up. The model considers the dimension of the problem and the total number of stored elements in the matrix.…”
Section: Discussionmentioning
confidence: 99%