SC22: International Conference for High Performance Computing, Networking, Storage and Analysis 2022
DOI: 10.1109/sc41404.2022.00012
|View full text |Cite
|
Sign up to set email alerts
|

Scaling Correlated Fragment Molecular Orbital Calculations on Summit

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
6
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5

Relationship

2
3

Authors

Journals

citations
Cited by 14 publications
(10 citation statements)
references
References 22 publications
0
6
0
Order By: Relevance
“…For GPU implementations, it is important that the integrals are never stored in the GPU global memory, as this leads to a large limitation of resources and a high number of read operations that are inherently slow. The Fock build described in refs and , which uses the aforementioned integral routines, uses an atomic-operation-oriented algorithm for digesting the integrals into the Fock matrix, thereby keeping synchronization at a minimum. This algorithm was implemented and benchmarked against state-of-the-art programs such as QUICK and Terachem (), showing promising speedups on the NVIDIA V100 architecture. , Figure shows speedups against Terachem and QUICK using the same benchmark systems as those for the ERIs.…”
Section: Graphical Processing Unitsmentioning
confidence: 99%
“…For GPU implementations, it is important that the integrals are never stored in the GPU global memory, as this leads to a large limitation of resources and a high number of read operations that are inherently slow. The Fock build described in refs and , which uses the aforementioned integral routines, uses an atomic-operation-oriented algorithm for digesting the integrals into the Fock matrix, thereby keeping synchronization at a minimum. This algorithm was implemented and benchmarked against state-of-the-art programs such as QUICK and Terachem (), showing promising speedups on the NVIDIA V100 architecture. , Figure shows speedups against Terachem and QUICK using the same benchmark systems as those for the ERIs.…”
Section: Graphical Processing Unitsmentioning
confidence: 99%
“…There have since been several high-performance GPU accelerated RI-MP2 implementations in various software packages, 77,133 including those by some of the present authors. 80,81 Our implementation, which achieves linear scaling with system size through usage of molecular fragmentation, enabled us to perform RI-MP2 energy calculations using the cc-pVDZ/cc-pVDZ-RIFIT basis sets on over 145 000 atoms within ∼40 min, using ∼27 000 GPUs on the Summit supercomputer at the Oak Ridge National Laboratory. 81 While numerous efficient CPU-based MP2 gradient algorithms and implementations have been developed, in the literature to date, there have only been two attempts to use GPUs to accelerate the MP2 or RI-MP2 gradients.…”
Section: Introductionmentioning
confidence: 99%
“…While these methods hold considerable promise, their practical application to large molecules is hindered by the steep scriptO ( N 5 ) computational scaling of the underlying MP2 calculations. Consequently, there has been tremendous research effort over recent decades on devising faster and more efficient algorithms and software for the evaluation of the MP2 energy , and gradients. ,,,, …”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations