“…This is very little bandwidth for strong scaling the solvers used for DWF QCD to, say, 1024 nodes, since the local volumes implied by a 1024 node job generally require of order one byte of off-node bandwidth per sustained Flop. To increase local (on-node) floating point utilization in the (M)DWF conjugate gradient, we have developed the Multisplitting Preconditioned Conjugate Gradient (MSPCG) [5,6]. The Multisplitting algorithm [7] provides general criteria for detailing how a linear equation solve can be split into submatrix pieces, with each solved separately, and then an update step is done, which spans the submatricies, to redefine the next iteration of the problem.…”