2021
DOI: 10.1016/j.cpc.2021.108081
|View full text |Cite
|
Sign up to set email alerts
|

Solving the Bethe-Salpeter equation on massively parallel architectures

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
3

Relationship

2
4

Authors

Journals

citations
Cited by 7 publications
(3 citation statements)
references
References 53 publications
0
3
0
Order By: Relevance
“…no time dependence, which is the most frequent approach. However, we note that current efforts also investigate extensions to the frequency dependence of screening [27,28]. By taking the Fourier transform we obtain the corresponding potential in real space,…”
Section: General Formalismmentioning
confidence: 99%
“…no time dependence, which is the most frequent approach. However, we note that current efforts also investigate extensions to the frequency dependence of screening [27,28]. By taking the Fourier transform we obtain the corresponding potential in real space,…”
Section: General Formalismmentioning
confidence: 99%
“…The reason for the performance drop is that communication (collective routine MPI_Allreduce) and memory copies between CPU and GPU are included in the total execution time of the Filter. In [45] (see Supplementary Materials, Table S7), the authors showed that the latency in MPI_Allreduce remains constant on more than 16 nodes, as does the impact of MPI communication on Filter performance. This is clearly observed in the 1MPIx4GPU configuration when the number of nodes is increased from 1 to 4, as no MPI communication was required on one node (only 1 MPI rank is used).…”
Section: Evaluation Of Mpi and Gpu Binding Configurationsmentioning
confidence: 99%
“…ChASE with configuration 1MPI×4GPUs always outperforms the other two, with 2MPI×2GPUs in between. Since QR and RR are computed redundantly on each MPI rank and operate on the full column size, the gain of the configuration with 1MPI×4GPUs over the other configurations comes from a lower communication overhead using expensive MPI_Ibcast (see [45], Supplement Materials, Table S7). Unlike MPI_Allreduce, the latency of the broadcasting routines increases steadily with the number of MPI ranks.…”
Section: Evaluation Of Mpi and Gpu Binding Configurationsmentioning
confidence: 99%