Proceedings of the XXVI International Symposium on Lattice Field Theory — PoS(LATTICE 2008) 2009
DOI: 10.22323/1.066.0026
|View full text |Cite
|
Sign up to set email alerts
|

Cell processor implementation of a MILC lattice QCD application

Abstract: We present results of the implementation of one MILC lattice QCD application-simulation with dynamical clover fermions using the hybrid-molecular dynamics R algorithm-on the Cell Broadband Engine processor. Fifty-four individual computational kernels responsible for 98.8% of the overall execution time were ported to the Cell's Synergistic Processing Elements (SPEs). The remaining application framework, including MPI-based distributed code execution, was left to the Cell's PowerPC processor. We observe that we … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2009
2009
2010
2010

Publication Types

Select...
2
1

Relationship

1
2

Authors

Journals

citations
Cited by 3 publications
(4 citation statements)
references
References 6 publications
0
4
0
Order By: Relevance
“…LQCD has also been implemented on other heterogeneous devices, primarily on the Cell Broadband Engine. Efforts in this direction have been reported in [14], [15] as part of the "QCD Parallel Computing on the Cell Broadband Engine" (QPACE) project and elsewhere [16], [17].…”
Section: Related Workmentioning
confidence: 99%
“…LQCD has also been implemented on other heterogeneous devices, primarily on the Cell Broadband Engine. Efforts in this direction have been reported in [14], [15] as part of the "QCD Parallel Computing on the Cell Broadband Engine" (QPACE) project and elsewhere [16], [17].…”
Section: Related Workmentioning
confidence: 99%
“…Some groups which reported on porting lattice QCD kernels or full lattice QCD applications to the Cell processor either avoided some optimization problems, e.g., by restricting themselves to the on-chip memory [13], or have limited their efforts to optimize their data layout [14]. An analysis of different optimization strategies can be found in [15].…”
Section: Application Code and Performancementioning
confidence: 99%
“…This was clearly demonstrated in the MILC implementation when accessing lattice site data in a strided manner. Thus, with an odd stride size (e.g., 1, 3, 5), the maximum bandwidth of 25.38 GB/s can be achieved, whereas with the stride size of 16, only 2.13 GB/s is achievable because only one out of 16 memory banks is used all the time [20].…”
Section: Conclusion and Lessons Learnedmentioning
confidence: 99%
“…The ERI kernel is an example of a problem whose computational complexity is O(N 4 ). Some of the results described in this paper have been presented in conference papers [19,20] while the quantum chemistry code results are new.…”
Section: Introductionmentioning
confidence: 99%