“…A number of high-performance lattice Boltzmann codes have been developed by various groups, including Palabos [15], waLBerla [16], MUPHY [17], and HARVEY [18]. With the availability of application programming interfaces for general-purpose graphics processing units, there has been increasing interest in GPU implementations of the LBM [19], [20], [21], [22], [23], [14], [24]. These efforts address a variety of aspects including efficient data layouts [22], [14], indirect addressing solutions [19], [23], and multi-GPU implementations [25], [21], [17], [23], [26], [27].…”