2011
DOI: 10.1016/j.camwa.2010.01.054
|View full text |Cite
|
Sign up to set email alerts
|

A new approach to the lattice Boltzmann method for graphics processing units

Abstract: International audienceEmerging many-core processors, like CUDA capable nVidia GPUs, are promising platforms for regular parallel algorithms such as the Lattice Boltzmann Method (LBM). Since the global memory for graphic devices shows high latency and LBM is data intensive, the memory access pattern is an important issue for achieving good performances. Whenever possible, global memory loads and stores should be coalescent and aligned, but the propagation phase in LBM can lead to frequent misaligned memory acce… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

2
78
0
1

Year Published

2012
2012
2024
2024

Publication Types

Select...
5
2

Relationship

2
5

Authors

Journals

citations
Cited by 125 publications
(81 citation statements)
references
References 15 publications
2
78
0
1
Order By: Relevance
“…As shown in [12], the misalignment overhead is significantly higher for store operations than for read operations. We therefore suggested in [13] to use the in-place propagation scheme outlined by Fig. 3 instead of the ordinary out-of-place propagation scheme illustrated in Fig.…”
Section: Gpu Implementations Of the Lbmmentioning
confidence: 99%
“…As shown in [12], the misalignment overhead is significantly higher for store operations than for read operations. We therefore suggested in [13] to use the in-place propagation scheme outlined by Fig. 3 instead of the ordinary out-of-place propagation scheme illustrated in Fig.…”
Section: Gpu Implementations Of the Lbmmentioning
confidence: 99%
“…Figures 4 and 5 outline the two propagation schemes (in the two-dimensional case, for the sake of simplicity). It was shown in [17] that the cost of misaligned reads is of the same order of magnitude than the overhead of a rearrange kernel. It should be noted that the in-place propagation approach is simpler and exerts less pressure on hardware than the shared memory approach.…”
Section: Gpu Implementation Of the Lbmmentioning
confidence: 99%
“…Our implementation is based on the isothermal flow solver described in [17]. The lattice is a rectangular cuboid of dimensions N x × N y × N z .…”
Section: Proposed Implementationmentioning
confidence: 99%
See 2 more Smart Citations