Proceedings of XXIIIrd International Symposium on Lattice Field Theory — PoS(LAT2005) 2005
DOI: 10.22323/1.020.0019
|View full text |Cite
|
Sign up to set email alerts
|

Performance of machines for lattice QCD simulations

Abstract: We review the architecture of massively parallel machines used for lattice QCD simulations and present benchmarks for the performance of popular algorithms on these platforms. We cover commercial supercomputers, PC clusters, and custom-designed machines. We also speculate on future developments.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2006
2006
2015
2015

Publication Types

Select...
3
3

Relationship

1
5

Authors

Journals

citations
Cited by 6 publications
(8 citation statements)
references
References 13 publications
0
8
0
Order By: Relevance
“…The components of a node are integrated in what we currently call a "brick". 2 The basis of a brick is a midplane, see Fig. 2 (left).…”
Section: System Designmentioning
confidence: 99%
See 1 more Smart Citation
“…The components of a node are integrated in what we currently call a "brick". 2 The basis of a brick is a midplane, see Fig. 2 (left).…”
Section: System Designmentioning
confidence: 99%
“…They also influenced the design of commercial supercomputers such as BlueGene. A probably incomplete list of such machines includes ACPMAPS, GF11, Fermi-256, QCDSP, QCDOC (all in the US), QCDPAX, CP-PACS, PACS-CS (Japan), as well as APE100, APEmille, apeNEXT, and QPACE 1 (Europe), see also [2] for a review.…”
Section: Introductionmentioning
confidence: 99%
“…To get a better picture, it is instructive to compare the performance of the GPU code with an equivalent code running purely on CPUs. The typical CPU dslash performance for double precision implementations is 1-2 GFlops/s [15]. Our own CPU implementation runs at 1.5 GFlops/s per core.…”
Section: Multi-gpu Dslash Implementationmentioning
confidence: 99%
“…As an example of a conventional x86 processor, a 2.4GHz Xeon system was used, along with a high performance SSE3 implementation of the Dslash operation [11,12] encapsulated in a library called intel sse wilson dslash. This is included with the Chroma library and optionally compiled in on systems which support SSE.…”
Section: Performancementioning
confidence: 99%