2015 IEEE 22nd International Conference on High Performance Computing (HiPC) 2015
DOI: 10.1109/hipc.2015.24
|View full text |Cite
|
Sign up to set email alerts
|

A Performance Model for GPU-Accelerated FDTD Applications

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
7
0

Year Published

2016
2016
2024
2024

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 11 publications
(7 citation statements)
references
References 14 publications
0
7
0
Order By: Relevance
“…The D fields do not exchange between GPUs. This data-exchange approach was applied in [3] and [8], and in the multi-CPU-core FDTD [16]. In MATLAB, this exchange can be defined by a data movement direction within the first SPMD statement.…”
Section: Computational Model and Gpu-based Fdtd Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…The D fields do not exchange between GPUs. This data-exchange approach was applied in [3] and [8], and in the multi-CPU-core FDTD [16]. In MATLAB, this exchange can be defined by a data movement direction within the first SPMD statement.…”
Section: Computational Model and Gpu-based Fdtd Methodsmentioning
confidence: 99%
“…There are two major approaches to programming multi-GPU based parallel FD-FDTD: open computing language (OpenCL) [3,4] and compute unified device architecture (CUDA) [5][6][7][8]. OpenCL is a framework and programming language that executes across heterogeneous platforms consisting of CPUs, GPUs, or other processors.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…For such networks, a viable theory for locating emergent behaviors in the parameter space (or gene's space) called local activity theory [5] was proposed and successfully tested [4]. Fluid dynamics, sound propagation, and many other physical phenomena can be modeled in FDTD frameworks such as cellular automata and Lattice Boltzmann Machines [6][7] [8]. Such models need convenient informatic implementations (modelling and simulation frameworks = MSF), and in recent years various commercial or noncommercial solutions were offered, most struggling to offer GPU support and high performance (short simulation times for wide arrays of cells).…”
Section: Introductionmentioning
confidence: 99%
“…However, this inspection suggests that kernels are bound on the DRAM latency as the requested transactions cannot fully utilize the DRAM resource. In contrast, the Tesla GPUs exhibited higher rates on DRAM utilization due to the ECC protection, which caused a much larger DRAM traffic (3,840,153,10 7) alu_fu ( 4) l2 ( 10) dram ( 8) l2 ( 8) dram ( 6) btr-fnd alu_fu (3) alu_fu ( 3) l2 ( 5) single_precision_fu ( 3) alu_fu ( 4) ldst_fu (4) btr-rng alu_fu ( 4) ldst_fu ( 4) l2 ( 6) tex ( 4) alu_fu ( 4) ldst_fu ( 5) bp-adj l2 (4) dram ( 5) l2 ( 9) dram ( 8) dram ( 5) dram (4) bp-fwd alu_fu ( 7) alu_fu ( 7) single_precision_fu ( 6) single_precision_fu ( 5) alu_fu ( 8) alu_fu ( 7) bfs-k1 alu_fu (2) ldst_fu ( 2) dram ( 3) dram ( 2) dram ( 4) dram (3) bfs-k2 alu_fu ( 4) alu_fu ( 3) dram ( 4) dram ( 3) alu_fu ( 4) alu_fu (3) dwt-cpy dram ( 9) dram ( 8) dram (…”
Section: Special Case Considerationsmentioning
confidence: 99%