Solvers for $\mathcal{O} (N)$ Electronic Structure in the Strong Scaling Limit

Bock, Nicolas; Challacombe, Matt; Kalé, Laxmikant V.

doi:10.1137/140974602

Cited by 14 publications

(14 citation statements)

References 97 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The graph-based electronic structure theory combines the natural parallelism of a divide and conquer approach [12][13][14][15][16][17] with the automatically adaptive and tunable accuracy of a thresholded sparse matrix algebra, [18][19][20][21][22][23][24][25][26][27][28][29][30][31] which can be combined with fast, low pre-factor, recursive Fermi operator expansion methods [32][33][34][35][36][37][38][39][40][41] and can be applied to modern formulations of Born-Oppenheimer molecular dynamics. [42][43][44][45][46][47][48][49][50] The article is outlined as follows: first we introduce the graph-based formalism for general sparse matrix polynomials expanded over separate subgraphs, thereafter we apply the methodology to the Fermi-operator expansion in electronic structure theory with demonstrations for a protein-like structure of polyalanine solvated in water, before analyzing applications in molecular dynamics simulations.…”

Section: Introductionmentioning

confidence: 99%

Graph-based linear scaling electronic structure theory

Niklasson

Mniszewski

Negre

et al. 2016

The Journal of Chemical Physics

Self Cite

View full text Add to dashboard Cite

We show how graph theory can be combined with quantum theory to calculate the electronic structure of large complex systems. The graph formalism is general and applicable to a broad range of electronic structure methods and materials, including challenging systems such as biomolecules. The methodology combines well-controlled accuracy, low computational cost, and natural low-communication parallelism. This combination addresses substantial shortcomings of linear scaling electronic structure theory, in particular with respect to quantum-based molecular dynamics simulations. C 2016 Author(s). All article content, except where otherwise noted, is licensed under a Creative Commons Attribution (CC BY) license

show abstract

Section: Introductionmentioning

confidence: 99%

Graph-based linear scaling electronic structure theory

Niklasson

Mniszewski

Negre

et al. 2016

The Journal of Chemical Physics

Self Cite

View full text Add to dashboard Cite

show abstract

“…Parallel implementations of SpAMM. Bock et al [10] present two parallel implementations of the SpAMM algorithm. The first one, which uses the OpenMP application programming interface, exploits parallel quad-tree traversal using untied task, i.e.…”

Section: 2mentioning

confidence: 99%

“…The first expression in (10) is the element-wise error introduced by truncation of matrices before multiplication, and its bound is derived in Lemma 1. The second expression in (10) is the error introduced by the SpAMM algorithm assuming that the matrices have been already truncated, and its bound is derived in Lemma 2. Combination of those results gives us…”

Section: Error Estimationsmentioning

confidence: 99%

Approximate multiplication of nearly sparse matrices with decay in a fully recursive distributed task-based parallel framework

Artemov

2019

Preprint

View full text Add to dashboard Cite

In this paper we consider parallel implementations of approximate multiplication of large matrices with exponential decay of elements. Such matrices arise in computations related to electronic structure calculations and some other fields of science. Commonly, sparsity is introduced by truncation of input matrices. In turn, the sparse approximate multiplication algorithm [M. Challacombe and N. Bock, arXiv preprint 1011.3534, 2010] performs truncation of sub-matrix products. We consider these two methods and their combination, i.e. truncation of both input matrices and sub-matrix products. Implementations done using the Chunks and Tasks programming model and library [E. H. Rubensson and E. Rudberg, Parallel Comput., 40:328343, 2014] are presented and discussed. The absolute error asymptotic behavior is derived. A comparison between the three methods in terms of performance is done on a model problem. The algorithms are also applied to matrices coming from large chemical systems with ∼ 10 6 atoms.

show abstract

“…Similarly, numerical applications such as scientific simulations also use SpGEMM as a subroutine. Typical examples include the Algebraic Multigrid (AMG) method for solving sparse system of linear equations [9], volumetric mesh processing [10], and linear-scaling electronic structure calculations [11].…”

Section: Introductionmentioning

confidence: 99%

Performance optimization, modeling and analysis of sparse matrix-matrix products on multi-core and many-core processors

et al. 2019

View full text Add to dashboard Cite

Sparse matrix-matrix multiplication (SpGEMM) is a computational primitive that is widely used in areas ranging from traditional numerical applications to recent big data analysis and machine learning. Although many SpGEMM algorithms have been proposed, hardware specific optimizations for multi-and many-core processors are lacking and a detailed analysis of their performance under various use cases and matrices is not available. We firstly identify and mitigate multiple bottlenecks with memory management and thread scheduling on Intel Xeon Phi (Knights Landing or KNL). Specifically targeting many-core processors, we develop a hashtable-based algorithm and optimize a heap-based shared-memory SpGEMM algorithm. We examine their performance together with other publicly available codes. Different from the literature, our evaluation also includes use cases that are representative of real graph algorithms, such as multi-source breadth-first search or triangle counting. Our hash-table and heap-based algorithms are showing significant speedups from libraries in the majority of the cases while different algorithms dominate the other scenarios with different matrix size, sparsity, compression factor and operation type. We wrap up in-depth evaluation results and make a recipe to give the best SpGEMM algorithm for target scenario. We build the performance model for hash-table and heap-based algorithms, which supports the recipe. A critical finding is that hash-table-based SpGEMM gets a significant performance boost if the nonzeros are not required to be sorted within each row of the output matrix. Finally, we integrate our implementations into a large-scale protein clustering code named HipMCL, accelerating its SpGEMM kernel by up to 10X and achieving an overall performance boost for the whole HipMCL application by 2.6X.

show abstract

Solvers for $\mathcal{O} (N)$ Electronic Structure in the Strong Scaling Limit

Cited by 14 publications

References 97 publications

Graph-based linear scaling electronic structure theory

Graph-based linear scaling electronic structure theory

Approximate multiplication of nearly sparse matrices with decay in a fully recursive distributed task-based parallel framework

Performance optimization, modeling and analysis of sparse matrix-matrix products on multi-core and many-core processors

Contact Info

Product

Resources

About