Sparse Geometries Handling in Lattice Boltzmann Method Implementation for Graphic Processors

Tomczak, Tadeusz; Szafran, Roman G.

doi:10.1109/tpds.2018.2810237

Cited by 14 publications

(14 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Moreover, the workload bounds are set to be in the magnitude of a 2D/3D computational fluid dynamic application involving 10 7 cells per process. The lower bound is 52 FLOP per cells, whereas the upper bound is 1165 FLOP per cells [14]. The number of overloading PEs is at most 20% of the total number of PEs.…”

Section: B Finding the Optimal Load Balancing Intervalsmentioning

confidence: 98%

On the Benefits of Anticipating Load Imbalance for Performance Optimization of Parallel Applications

Boulmier

Raynaud

Abdennadher

et al. 2019

2019 IEEE International Conference on Cluster Computing (CLUSTER)

View full text Add to dashboard Cite

In parallel iterative applications, computational efficiency is essential for addressing large problems. Load imbalance is one of the major performance degradation factors of parallel applications. Therefore, distributing, cleverly, and as evenly as possible, the workload among processing elements (PE) maximizes application performance. So far, the standard load balancing method consists in distributing the workload evenly between PEs and, when load imbalance appears, redistributing the extra load from overloaded PEs to underloaded PEs. However, this does not anticipate the load imbalance growth that may continue during the next iterations. In this paper, we present a first step toward a novel philosophy of load balancing that unloads the PEs that will be overloaded in the near future to let the application rebalance itself via its own dynamics. Herein, we present a formal definition of our new approach using a simple mathematical model and discuss its advantages compared to the standard load balancing method. In addition to the theoretical study, we apply our method to an application that reproduces the computation of a fluid model with non-uniform erosion. The performance validates the benefit of anticipating load imbalance. We observed up to 16% performance improvement compared to the standard load balancing method.

show abstract

Section: B Finding the Optimal Load Balancing Intervalsmentioning

confidence: 98%

On the Benefits of Anticipating Load Imbalance for Performance Optimization of Parallel Applications

Boulmier

Raynaud

Abdennadher

et al. 2019

2019 IEEE International Conference on Cluster Computing (CLUSTER)

View full text Add to dashboard Cite

show abstract

“…Lattice-Boltzmann computational fluid dynamic problem with 10 9 D2Q9 cells per processing unit with a performance of 1 Gflops [26]. The number of processing units is equal to the number of cores available in the supercomputer "Sunway…”

Section: Synthetic Benchmarksmentioning

confidence: 99%

Optimal Load Balancing and Assessment of Existing Load Balancing Criteria

Boulmier¹,

Abdennadher²,

Chopard³

2021

Preprint

View full text Add to dashboard Cite

Parallel iterative applications often suffer from load imbalance, one of the most critical performance degradation factors. Hence, load balancing techniques are used to distribute the workload evenly to maximize performance. A key challenge is to know when to use load balancing techniques. In general, this is done through load balancing criteria, which trigger load balancing based on runtime application data and/or user-defined information. In the first part of this paper, we introduce a novel, automatic load balancing criterion derived from a simple mathematical model. In the second part, we propose a branch-and-bound algorithm to find the load balancing iterations that lead to the optimal application performance. This algorithm finds the optimal load balancing scenario in quadratic time while, to the best of our knowledge, this has never been addressed in less than an exponential time. Finally, we compare the performance of the scenarios produced by state-of-the-art load balancing criteria relative to the optimal load balancing scenario in synthetic benchmarks and parallel N-body simulations. In the synthetic benchmarks, we observe that the proposed criterion outperforms the other automatic criteria. In the numerical experiments, we show that our new criterion is, on average, 4.9% faster than state-of-the-art load balancing criteria and can outperform them by up to 17.6%. Moreover,

show abstract

“…Interactions between nodes are entirely linear, while the method's non-linearity enters a local collision process within each node [12]. This property makes the LBM very amenable to high-performance computing on parallel architectures, including GPUs [13,14]. The key advantage of LBM from the viewpoint of microfluidics is the adequacy for simulation of mesoscopic physics and multi-scale effects, which are hard to describe macroscopically.…”

Section: Introductionmentioning

confidence: 99%

The Lattice-Boltzmann Modeling of Microflows in a Cell Culture Microdevice for High-Throughput Drug Screening

Szafran¹,

Davykoza²

2021

Applied Sciences

Self Cite

View full text Add to dashboard Cite

The aim of our research was to develop a numerical model of microflows occurring in the culture chambers (CC) of a microfluidic device of our construction for high-throughput drug screening. The incompressible fluid flow model is based on the lattice-Boltzmann equation, with an external body force term approximated by the He-Shan-Doolen scheme and the Bhatnagar-Gross-Krook approximation of the collision operator. The model accuracy was validated by the algebraic solution of the Navier–Stokes equation (NSE) for a fully developed duct flow, as well as experimentally. The mean velocity prediction error for the middle-length cross-section of CC was 1.0%, comparing to the NSE algebraic solution. The mean error of volumetric flow rate prediction was 6.1%, comparing to the experimental results. The analysis of flow hydrodynamics showed that the discrepancies from the plug-flow-like velocity profile are observed close to the inlets only, and do not influence cell cultures in the working area of CC. Within its workspace area, the biochip provides stable and homogeneous fully developed laminar flow conditions, which make the procedures of gradient generation, cell seeding, and cell-staining repeatable and uniform across CC, and weakly dependent on perturbations.

show abstract

Sparse Geometries Handling in Lattice Boltzmann Method Implementation for Graphic Processors

Cited by 14 publications

References 28 publications

On the Benefits of Anticipating Load Imbalance for Performance Optimization of Parallel Applications

On the Benefits of Anticipating Load Imbalance for Performance Optimization of Parallel Applications

Optimal Load Balancing and Assessment of Existing Load Balancing Criteria

The Lattice-Boltzmann Modeling of Microflows in a Cell Culture Microdevice for High-Throughput Drug Screening

Contact Info

Product

Resources

About