2017
DOI: 10.1080/19942060.2017.1317027
|View full text |Cite
|
Sign up to set email alerts
|

A graphics processing unit-accelerated meshless method for two-dimensional compressible flows

Abstract: A graphics processing unit (GPU) -accelerated meshless method is presented for solving twodimensional compressible flows over aerodynamic bodies. The Compute Unified Device Architecture (CUDA) Fortran programming model is employed to port the meshless method from central processing unit to GPU as a way of achieving efficiency, which involves implementation of CUDA kernels and management of data storage structure and thread hierarchy. The CUDA kernel subroutines are designed to meet with the point-based computi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
9
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
4

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(9 citation statements)
references
References 31 publications
0
9
0
Order By: Relevance
“…In order to optimize the GPU performance, the number of threads per block for each kernel should be carefully tuned. According to our recently reported work [33], 64 threads per block is a reasonable choice for the CUDA kernels. Thus the total number of thread blocks could be determined by ( 1) / gridDim nTotalThread blockDim blockDim    (17) where nTotalThread represents the total number of threads.…”
Section: Cuda Kernel Functionsmentioning
confidence: 99%
See 2 more Smart Citations
“…In order to optimize the GPU performance, the number of threads per block for each kernel should be carefully tuned. According to our recently reported work [33], 64 threads per block is a reasonable choice for the CUDA kernels. Thus the total number of thread blocks could be determined by ( 1) / gridDim nTotalThread blockDim blockDim    (17) where nTotalThread represents the total number of threads.…”
Section: Cuda Kernel Functionsmentioning
confidence: 99%
“…This pattern is adopted in the present work so that all the threads in a half wrap map/access the global memory simultaneously with respect to the center of a meshless cloud. In reality, this means consecutive thread access consecutive memory addresses [33,34]. The computed results including Mach number contours and pressure coefficients are depicted in Fig.…”
Section: Device Memory Managementmentioning
confidence: 99%
See 1 more Smart Citation
“…In general, recent developments in meshless community are vivid, ranging from analyses of computer execution on different platforms [6,12], reducing computational cost by introducing a piecewise approximation [13] to implementation of more complex multi-phase flow [14], and many more. This paper extends the spectra of published papers with a generalized formulation of a local strong form meshless method, termed Meshless Local Strong Form Method (MLSM) enriched with h-refinement [15] and ability to discretize arbitrary domains [7].…”
Section: Introductionmentioning
confidence: 99%
“…In general, recent developments in meshless community are vivid, ranging from analyses of computer execution on different platforms [6,12], reducing computational cost by introducing a piecewise approximation [13] to implementation of more complex multi-phase flow [14], and many more.…”
Section: Introductionmentioning
confidence: 99%