2014
DOI: 10.1016/j.cpc.2014.02.021
|View full text |Cite
|
Sign up to set email alerts
|

The density matrix renormalization group algorithm on kilo-processor architectures: Implementation and trade-offs

Abstract: In the numerical analysis of strongly correlated quantum lattice models one of the leading algorithms developed to balance the size of the effective Hilbert space and the accuracy of the simulation is the density matrix renormalization group (DMRG) algorithm, in which the run-time is dominated by the iterative diagonalization of the Hamilton operator. As the most time-dominant step of the diagonalization can be expressed as a list of dense matrix operations, the DMRG is an appealing candidate to fully utilize … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
12
0

Year Published

2015
2015
2024
2024

Publication Types

Select...
4
1

Relationship

2
3

Authors

Journals

citations
Cited by 19 publications
(12 citation statements)
references
References 37 publications
0
12
0
Order By: Relevance
“…Quite recently, it has been investigated how the DMRG method can utilize the enormous computing capabilities of novel kilo‐processor architectures: GPU and field‐programmable gate array . In case of GPU, a smart hybrid CPU‐GPU acceleration has been presented, which tolerates problems exceeding the GPU memory size, consequently, supporting wide range of problems and GPU configurations.…”
Section: Numerical Techniquesmentioning
confidence: 99%
“…Quite recently, it has been investigated how the DMRG method can utilize the enormous computing capabilities of novel kilo‐processor architectures: GPU and field‐programmable gate array . In case of GPU, a smart hybrid CPU‐GPU acceleration has been presented, which tolerates problems exceeding the GPU memory size, consequently, supporting wide range of problems and GPU configurations.…”
Section: Numerical Techniquesmentioning
confidence: 99%
“…When the data are no longer needed, the allocated memory is freed. Regarding the sources of parallelism, we combine operator and symmetry sector parallelisms similarly to the simpler shared memory approach, 41,45 in order to generate a large‐enough number of tasks (dense matrix–matrix operations), which can be executed in parallel. All three main steps are task‐based parallelized.…”
Section: Theorymentioning
confidence: 99%
“…So, for example instead of the aforementioned term ArsIIaras, we have A ↑↑ ⊗ I ⊗ I ⊗ a ↑ a ↑ and the loop over rs is performed sequentionally during the task execution. It is organized this way to avoid fetching of small memory chunks and with the view of a further GPU acceleration in future [parallel execution of matrix–matrix multiplications ) performed on different slices of the same dense tensors] 45 …”
Section: Theorymentioning
confidence: 99%
See 2 more Smart Citations