2016
DOI: 10.1016/j.cma.2016.03.038
|View full text |Cite
|
Sign up to set email alerts
|

Finite element numerical integration for first order approximations on multi- and many-core architectures

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
13
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 28 publications
(13 citation statements)
references
References 28 publications
0
13
0
Order By: Relevance
“…Thus, optimizing this kernel to take advantage of current architectures, from the cache hierarchy to the different levels of parallelism, is not straightforward and the scientific literature dealing with this topic is abundant. For instance, optimized implementations on GPU have been described in [2], [3], [4]. Most of these approaches implement mesh coloring strategy and fully benefit from the memory bandwidth available on the underlying architecture.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Thus, optimizing this kernel to take advantage of current architectures, from the cache hierarchy to the different levels of parallelism, is not straightforward and the scientific literature dealing with this topic is abundant. For instance, optimized implementations on GPU have been described in [2], [3], [4]. Most of these approaches implement mesh coloring strategy and fully benefit from the memory bandwidth available on the underlying architecture.…”
Section: Related Workmentioning
confidence: 99%
“…Most of these approaches implement mesh coloring strategy and fully benefit from the memory bandwidth available on the underlying architecture. At the shared-memory level, FEM implementations described in [5], [4], [6] underlines the impact of SIMD instructions and data-reuse at the cache memory level. Additionally, advanced algorithms described in [7] introduced a divide and conquer methodology to build a tree of dependent tasks.…”
Section: Related Workmentioning
confidence: 99%
“…free_if(.false.)) (6) !$omp parallel do private (ik, i, k) (7) do ik � 1, ik_total (8) i � (ik − 1)/(kd − 1) + 1 (9) k � mod(ik − 1, kd − 1) + 1 (10) call Fluxj_mic (mbc_n, i, k Scientific Programming nested loops with fluxes computing. erefore, we merge the two loops inside to provide larger data set for vectorization on MIC.…”
Section: Vectorizationmentioning
confidence: 99%
“…Wang et al [8] reported the large-scale computation of a highorder CFD code on Tianhe-2 supercomputer that consists of both CPU and MIC coprocessors. And other CFD-related works on Intel MIC architecture can be found in references [9][10][11][12]. Working as coprocessors, GPUs also have been popular in CFD.…”
Section: Introductionmentioning
confidence: 99%
“…Brook et al detailed their early efforts to port and optimize scientific and engineering application codes to the Intel MIC architecture. Banaś et al presented investigations on the performance of the finite element numerical integration algorithm and 3 processor architectures, popular in scientific computing, classical x86_64 CPU, Intel Xeon Phi, and NVIDIA Kepler GPU. Kahale et al explored the NS equation and its solution methodology using multigrid method and Intel Xeon Phi accelerator device.…”
Section: Introductionmentioning
confidence: 99%