2010
DOI: 10.1147/jrd.2010.2071191
|View full text |Cite
|
Sign up to set email alerts
|

Implementation and performance analysis of parallel conjugate gradient on the Cell Broadband Engine

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2011
2011
2014
2014

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(4 citation statements)
references
References 13 publications
0
4
0
Order By: Relevance
“…All computational intensive elements in CG and ORTHOMIN (i.e. inner products and matrix vector operations) are trivially parallelizable [2], and as a result, researchers [2,6] have placed importance on analyzing the parallelization of sparse matrix-vector multiplication (SpMV), given its role in solving linear systems and eigenvalue problems that arise in scientific and engineering applications. The methods for efficiently manipulating sparse matrix structures are of utmost importance to the performance of many applications because they arise in numerous computational disciplines.…”
Section: ) Backgroundmentioning
confidence: 99%
See 2 more Smart Citations
“…All computational intensive elements in CG and ORTHOMIN (i.e. inner products and matrix vector operations) are trivially parallelizable [2], and as a result, researchers [2,6] have placed importance on analyzing the parallelization of sparse matrix-vector multiplication (SpMV), given its role in solving linear systems and eigenvalue problems that arise in scientific and engineering applications. The methods for efficiently manipulating sparse matrix structures are of utmost importance to the performance of many applications because they arise in numerous computational disciplines.…”
Section: ) Backgroundmentioning
confidence: 99%
“…Unlike the parallel Conjugate Gradient algorithm [2], in ORTHOMIN, because of large storage requirements, we opted for row-wise decomposition of the matrix, where PPE acts as a master, and notifies the SPEs to perform matrix vector products and the dot products (which consumes a significant amount of time to execute). During each loop, the PPE has to notify each SPE through the signaling API (see Figure 2), to fetch the corresponding input vector based on the signal type (32-bit data), perform the partial matrix vector and dot product operations and return the partial output vector to the main memory through DMA calls and notify PPE that the assigned task has been finished.…”
Section: ) Parallel Implementationmentioning
confidence: 99%
See 1 more Smart Citation
“…It is well known that the cost of the CG is dominated by the SpMV operation [34][35][36]. This is is exemplified in Figure 1.10 for the solution of the discrete Poisson equation on a mesh over a spherical domain with 400, 000 control volumes (CV) on a single CPU.…”
Section: Distribution Of Execution Timementioning
confidence: 99%