2009
DOI: 10.1021/ct900543q
|View full text |Cite
|
Sign up to set email alerts
|

Accelerating Correlated Quantum Chemistry Calculations Using Graphical Processing Units and a Mixed Precision Matrix Multiplication Library

Abstract: Two new tools for the acceleration of computational chemistry codes using graphical processing units (GPUs) are presented. First, we propose a general black-box approach for the efficient GPU acceleration of matrix−matrix multiplications where the matrix size is too large for the whole computation to be held in the GPU’s onboard memory. Second, we show how to improve the accuracy of matrix multiplications when using only single-precision GPU devices by proposing a heterogeneous computing model, whereby single-… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4

Citation Types

1
103
0

Year Published

2010
2010
2016
2016

Publication Types

Select...
5
3

Relationship

2
6

Authors

Journals

citations
Cited by 84 publications
(104 citation statements)
references
References 23 publications
1
103
0
Order By: Relevance
“…By contrast, the flagship Tesla K20x by Nvidia (2688 CUDA cores at 732 MHz) has a peak of 1.31 TFlop/s for double-precision arithmetic and a memory bandwidth of 250 GB/s with ECC (error-correcting code) off. Hence many groups decided to develop GPU-accelerated programs [13,14] to take advantage of this promising device for quantum Monte Carlo computations [15,16], evaluation of two-electron integrals [17][18][19][20][21][22], DFT calculations [23][24][25][26][27][28][29][30], high-level correlated ab initio methods [31][32][33][34][35][36][37][38], and semiempirical quantum chemistry [39,40]. The other chapters of this book contain an excellent overview of many of these porting efforts.…”
Section: Introductionmentioning
confidence: 99%
“…By contrast, the flagship Tesla K20x by Nvidia (2688 CUDA cores at 732 MHz) has a peak of 1.31 TFlop/s for double-precision arithmetic and a memory bandwidth of 250 GB/s with ECC (error-correcting code) off. Hence many groups decided to develop GPU-accelerated programs [13,14] to take advantage of this promising device for quantum Monte Carlo computations [15,16], evaluation of two-electron integrals [17][18][19][20][21][22], DFT calculations [23][24][25][26][27][28][29][30], high-level correlated ab initio methods [31][32][33][34][35][36][37][38], and semiempirical quantum chemistry [39,40]. The other chapters of this book contain an excellent overview of many of these porting efforts.…”
Section: Introductionmentioning
confidence: 99%
“…Despite these exciting developments in basic theory, fast numerical algorithms, and novel hardware utilization, [1][2][3][4][5] it is not yet possible to routinely simulate such large systems using readily available computer resources. However, emerging nano-scale simulation challenges involving, for example, large biomolecular aggregates or nanotechnology devices are increasing the demand for firstprinciples methods that can deliver this performance.…”
Section: Introductionmentioning
confidence: 99%
“…These include molecular dynamics and quantum Monte Carlo simulations, density-functional theory and self-consistent * Electronic address: aspuru@chemistry.harvard.edu field calculations [1,2] as well as correlated quantum chemistry applications [3,4]. Efficiency gains of between one and three orders of magnitude have been reported compared to conventional implementations on a CPU.…”
Section: Introductionmentioning
confidence: 99%
“…In summary, we describe our efforts to develop tools for the GPU acceleration of correlated quantum chem-istry calculations [3,4]. We address the issues of limited GPU device memory, as well as achieving higher accuracy using only single-precision GPU operations.…”
Section: Introductionmentioning
confidence: 99%