2018
DOI: 10.1016/j.cpc.2017.10.018
|View full text |Cite
|
Sign up to set email alerts
|

Massively parallel multicanonical simulations

Abstract: Generalized-ensemble Monte Carlo simulations such as the multicanonical method and similar techniques are among the most efficient approaches for simulations of systems undergoing discontinuous phase transitions or with rugged freeenergy landscapes. As Markov chain methods, they are inherently serial computationally. It was demonstrated recently, however, that a combination of independent simulations that communicate weight updates at variable intervals allows for the efficient utilization of parallel computat… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
22
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
6
1

Relationship

4
3

Authors

Journals

citations
Cited by 23 publications
(22 citation statements)
references
References 46 publications
0
22
0
Order By: Relevance
“…If, after some time, the histogram H Ê ( ) of all possible energies is found to be 'sufficiently flat' (typically interpreted as no histogram bin having less than 80% of the average number of entries [8], but see also [50] for a related discussion), the modification factor is reduced as f f  , and the histogram H Ê ( ) is reset to an empty state. The algorithm stops if f is 'sufficiently small', for example f f exp 10…”
Section: Wl Samplingmentioning
confidence: 99%
“…If, after some time, the histogram H Ê ( ) of all possible energies is found to be 'sufficiently flat' (typically interpreted as no histogram bin having less than 80% of the average number of entries [8], but see also [50] for a related discussion), the modification factor is reduced as f f  , and the histogram H Ê ( ) is reset to an empty state. The algorithm stops if f is 'sufficiently small', for example f f exp 10…”
Section: Wl Samplingmentioning
confidence: 99%
“…If only enough such thread groups are available, the compute cores will be kept constantly busy and hence the memory latencies are hidden away. Good GPU performance thus requires to break the work into many threads, optimal performance is often only reached for thread numbers in excess of ten times the number of available physical cores [30].…”
Section: Gpu Realizationmentioning
confidence: 99%
“…For the total times per spin-flip, including the time spent on histogram and weight updates, we arrive at peak performances of 0.22 ns and 0.16 ns for the Tesla K20m and GTX Titan Black cards, respectively, which corresponds to a 15-21 times speedup as compared to the performance of an MPI code on a full dual-CPU node with a total of 12 cores (24 hyper-threads) with Intel Xeon E5-2640 CPUs. 119 This optimal performance is found for fully loading the GPUs with threads, i.e., for the maximum occupancy, corresponding to 30 720 threads for the Titan Black and 26 624 threads for K20m. The total speedup of the parallel implementation also depends on the effect of the parallel calculation on the number of required iterations until divergence, which is found to be slowly decreasing with p, 119 at least if a number of equilibration updates in between iterations ensures that the walkers are thermalized with respect to the updated weights before collecting statistics for the next iteration.…”
Section: Multicanonical Simulationsmentioning
confidence: 99%
“…119 This optimal performance is found for fully loading the GPUs with threads, i.e., for the maximum occupancy, corresponding to 30 720 threads for the Titan Black and 26 624 threads for K20m. The total speedup of the parallel implementation also depends on the effect of the parallel calculation on the number of required iterations until divergence, which is found to be slowly decreasing with p, 119 at least if a number of equilibration updates in between iterations ensures that the walkers are thermalized with respect to the updated weights before collecting statistics for the next iteration. The total speedup in the time-to-solution for the parallel multicanonical code on GPU as a function of p is shown in Fig.…”
Section: Multicanonical Simulationsmentioning
confidence: 99%
See 1 more Smart Citation