Solving the Bethe-Salpeter equation on massively parallel architectures

Zhang, Xiao; Achilles, Sebastian; Winkelmann, Jan; Haas, Roland; Schleife, André; Napoli, Edoardo Di

doi:10.1016/j.cpc.2021.108081

Cited by 7 publications

(3 citation statements)

References 53 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…no time dependence, which is the most frequent approach. However, we note that current efforts also investigate extensions to the frequency dependence of screening [27,28]. By taking the Fourier transform we obtain the corresponding potential in real space,…”

Section: General Formalismmentioning

confidence: 99%

Linear scaling approach for optical excitations using maximally localized Wannier functions

Merkel,

Ortmann

2023

J. Phys. Mater.

View full text Add to dashboard Cite

We present a theoretical method for calculating optical absorption spectra based on maximally localized Wannier functions, which is suitable for large periodic systems. For this purpose, we calculate the exciton Hamiltonian, which determines the Bethe-Salpeter equation for the macroscopic polarization function and optical absorption characteristics. The Wannier functions are specific to each material and provide a minimal and therefore computationally convenient basis. Furthermore, their strong localization greatly improves the computational performance in two ways: first, the resulting Hamiltonian becomes very sparse and, second, the electron-hole interaction terms can be evaluated efficiently in real space, where large electron-hole distances are handled by a multipole expansion.For the calculation of optical spectra we employ the sparse exciton Hamiltonian in a time-domain approach, which scales linearly with system size. We demonstrate the method for bulk silicon - one of the most frequently studied benchmark systems - and envision calculating optical properties of systems with much larger and more complex unit cells, which are presently computationally prohibitive.

show abstract

Section: General Formalismmentioning

confidence: 99%

Linear scaling approach for optical excitations using maximally localized Wannier functions

Merkel,

Ortmann

2023

J. Phys. Mater.

View full text Add to dashboard Cite

show abstract

“…The reason for the performance drop is that communication (collective routine MPI_Allreduce) and memory copies between CPU and GPU are included in the total execution time of the Filter. In [45] (see Supplementary Materials, Table S7), the authors showed that the latency in MPI_Allreduce remains constant on more than 16 nodes, as does the impact of MPI communication on Filter performance. This is clearly observed in the 1MPIx4GPU configuration when the number of nodes is increased from 1 to 4, as no MPI communication was required on one node (only 1 MPI rank is used).…”

Section: Evaluation Of Mpi and Gpu Binding Configurationsmentioning

confidence: 99%

“…ChASE with configuration 1MPI×4GPUs always outperforms the other two, with 2MPI×2GPUs in between. Since QR and RR are computed redundantly on each MPI rank and operate on the full column size, the gain of the configuration with 1MPI×4GPUs over the other configurations comes from a lower communication overhead using expensive MPI_Ibcast (see [45], Supplement Materials, Table S7). Unlike MPI_Allreduce, the latency of the broadcasting routines increases steadily with the number of MPI ranks.…”

Section: Evaluation Of Mpi and Gpu Binding Configurationsmentioning

confidence: 99%

ChASE -- A Distributed Hybrid CPU-GPU Eigensolver for Large-scale Hermitian Eigenvalue Problems

Wu¹,

Davidović²,

Achilles³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

As modern massively parallel clusters are getting larger with beefier compute nodes, traditional parallel eigensolvers, such as direct solvers, struggle keeping the pace with the hardware evolution and being able to scale efficiently due to additional layers of communication and synchronization. This difficulty is especially important when porting traditional libraries to heterogeneous computing architectures equipped with accelerators, such as Graphics Processing Unit (GPU). Recently, there have been significant scientific contributions to the development of filter-based subspace eigensolver to compute partial eigenspectrum. The simpler structure of these type of algorithms makes for them easier to avoid the communication and synchronization bottlenecks typical of direct solvers. The Chebyshev Accelerated Subspace Eigensolver (ChASE) is a modern subspace eigensolver to compute partial extremal eigenpairs of large-scale Hermitian eigenproblems with the acceleration of a filter based on Chebyshev polynomials. In this work, we extend our previous work on ChASE by adding support for distributed hybrid CPU-multi-GPU computing architectures. Out tests show that ChASE achieves very good scaling performance up to 144 nodes with 526 NVIDIA A100 GPUs in total on dense eigenproblems of size up to 360k. CCS CONCEPTS• Computing methodologies → Parallel algorithms; • Mathematics of computing → Mathematical software performance.

show abstract

GPU-Accelerated Solution of the Bethe–Salpeter Equation for Large and Heterogeneous Systems

Yu,

Jin,

Galli

et al. 2024

J. Chem. Theory Comput.

View full text Add to dashboard Cite

We present a massively parallel GPU-accelerated implementation of the Bethe−Salpeter equation (BSE) for the calculation of the vertical excitation energies (VEEs) and optical absorption spectra of condensed and molecular systems, starting from single-particle eigenvalues and eigenvectors obtained with density functional theory. The algorithms adopted here circumvent the slowly converging sums over empty and occupied states and the inversion of large dielectric matrices through a density matrix perturbation theory approach and a low-rank decomposition of the screened Coulomb interaction, respectively. Further computational savings are achieved by exploiting the nearsightedness of the density matrix of semiconductors and insulators to reduce the number of screened Coulomb integrals. We scale our calculations to thousands of GPUs with a hierarchical loop and data distribution strategy. The efficacy of our method is demonstrated by computing the VEEs of several spin defects in wide-band-gap materials, showing that supercells with up to 1000 atoms are necessary to obtain converged results. We discuss the validity of the common approximation that solves the BSE with truncated sums over empty and occupied states. We then apply our GW-BSE implementation to a diamond lattice with 1727 atoms to study the symmetry breaking of triplet states caused by the interaction of a point defect with an extended line defect.

show abstract

Solving the Bethe-Salpeter equation on massively parallel architectures

Cited by 7 publications

References 53 publications

Linear scaling approach for optical excitations using maximally localized Wannier functions

Linear scaling approach for optical excitations using maximally localized Wannier functions

ChASE -- A Distributed Hybrid CPU-GPU Eigensolver for Large-scale Hermitian Eigenvalue Problems

GPU-Accelerated Solution of the Bethe–Salpeter Equation for Large and Heterogeneous Systems

Contact Info

Product

Resources

About