2018 26th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP) 2018
DOI: 10.1109/pdp2018.2018.00043
|View full text |Cite
|
Sign up to set email alerts
|

Data-Layout Reorganization for an Efficient Intra-Node Assembly of a Spectral Finite-Element Method

Abstract: The Finite-Element Method (FEM) is routinely used to solve Partial Differential Equations (PDE) in various scientific domains. For seismic waves modeling, the Spectral Element Method (SEM), which is a specific formulation of the classical FEM approach, have gained significant attention for the last two decades. This is explained both from the very good numerical accuracy of this method and from the parallel performance of classical MPI-based implementations that scale up to several tens of thousands computing … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

1
3
0

Year Published

2018
2018
2018
2018

Publication Types

Select...
1

Relationship

1
0

Authors

Journals

citations
Cited by 1 publication
(4 citation statements)
references
References 12 publications
1
3
0
Order By: Relevance
“…The kernel extracted corresponds to the computation of the internal forces and represents a maximum of 90% of the total elapsed time. This paper extends previous works dealing with mono-core vectorization [11] and datalayout reorganization [12]. Our contributions in this paper are as follows:…”
Section: Introductionsupporting
confidence: 72%
See 3 more Smart Citations
“…The kernel extracted corresponds to the computation of the internal forces and represents a maximum of 90% of the total elapsed time. This paper extends previous works dealing with mono-core vectorization [11] and datalayout reorganization [12]. Our contributions in this paper are as follows:…”
Section: Introductionsupporting
confidence: 72%
“…The support of the AVX512 allows the hardware to compute 16 single precision floats by instruction. We recall from [12] that automatic optimizations provided by the compilers hardly reach 140 GFLOPS on both platforms, this represents less than 5% of the theoretical peak performance. Figure 9 compares the performance between 256bit and 512-bit SIMD instructions.…”
Section: Impact Of the Vectorizationmentioning
confidence: 91%
See 2 more Smart Citations