We show that using the multisplitting algorithm as a preconditioner for conjugate gradient inversion of the domain wall Dirac operator could effectively reduce the internode communication cost, at the expense of performing more on-node floating point operations. This method could be useful for supercomputers with far more on-node flops than inter-node communication bandwidth.
The RBC and UKQCD Collaborations have shown that light hadron masses and meson decay constants measured on 2+1 flavor Mobius DWF ensembles generated with the Iwasaki gauge action and a dislocation suppressing determinant ratio (DSDR) term show few percent O(a2) scaling violations for ensembles with a-1 = 1 GeV. We call this combination the ID+MDWF action and this scaling implies that, to a good approximation, these ensembles lie on a renormalization group trajectory, where the form of the action is unchanged and only the bare parameters need to be tuned to stay on the trajectory. Here we investigate whether a single-step APE-like blocking kernel can reproduce this trajectory and test its accuracy via measurements of the light hadron spectrum and non-perturbative renormalization. As we report, we find close matching to the renormalization group trajectory from this simple blocking kernel.
We show that using the multisplitting algorithm as a preconditioner for conjugate gradient inversion of the domain wall fermion Dirac operator could effectively reduce the inter-node communication cost, at the expense of performing more on-node floating point operations. This method could be useful for supercomputers with far more on-node flops than inter-node communication bandwidth.
We show that using the multi-splitting algorithm as a preconditioner for the domain wall Dirac linear operator, arising in lattice QCD, effectively reduces the inter-node communication cost, at the expense of performing more on-node floating point and memory operations. Correctly including the boundary snake terms, the preconditioner is implemented in the QUDA framework, where it is found that utilizing kernel fusion and the tensor cores on NVIDIA GPUs is necessary to achieve a sufficiently performant preconditioner. A reduced-dimension (reduced-L s ) strategy is also proposed and tested for the preconditioner. We find the method achieves lower time to solution than regular CG at high node count despite the additional local computational requirements from the preconditioner. This method could be useful for supercomputers with more on-node flops and memory bandwidth than inter-node communication bandwidth.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.