2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS) 2017
DOI: 10.1109/ipdps.2017.33
|View full text |Cite
|
Sign up to set email alerts
|

Optimization and Parallelization of B-Spline Based Orbital Evaluations in QMC on Multi/Many-Core Shared Memory Processors

Abstract: B-spline based orbital representations are widely used in Quantum Monte Carlo (QMC) simulations of solids, historically taking as much as 50% of the total run time. Random accesses to a large four-dimensional array make it challenging to efficiently utilize caches and wide vector units of modern CPUs. We present node-level optimizations of B-spline evaluations on multi/many-core shared memory processors. To increase SIMD efficiency and bandwidth utilization, we first apply data layout transformation from array… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
8
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
3
3

Relationship

3
3

Authors

Journals

citations
Cited by 6 publications
(8 citation statements)
references
References 16 publications
0
8
0
Order By: Relevance
“…Residual and Gradient Data Layout: We implement a variant of the hybrid/tiled Array-of-Structs (Array-of-Structsof-Arrays (AoSoA)) [21], [22], [23] data structure layout to store the residual and gradient vectors. For simplicity, we call it Array-of-Structs-of-Strided-Arrays memory layout, in which the struct's arrays are tiled (see Figure 2).…”
Section: Geometric Data Layoutmentioning
confidence: 99%
“…Residual and Gradient Data Layout: We implement a variant of the hybrid/tiled Array-of-Structs (Array-of-Structsof-Arrays (AoSoA)) [21], [22], [23] data structure layout to store the residual and gradient vectors. For simplicity, we call it Array-of-Structs-of-Strided-Arrays memory layout, in which the struct's arrays are tiled (see Figure 2).…”
Section: Geometric Data Layoutmentioning
confidence: 99%
“…Additional physics modules use different sub-resolution models, to properly treat processes which are far below the resolution limit in galaxy simulations. Our work is performed on a code version dubbed as P-GADGET3 [14], based on the latest public release GADGET-2 [13] 2 . Some of the core parts of the code recently underwent a modernization, as reported by [6].…”
Section: P-gadget3mentioning
confidence: 99%
“…The optimization of simulation code, or the ab initio development of modern applications have arisen to be an urgent task in the scientific community [1], [2], [3], [4], [5], [6], [7]. An underlying and often implicit point in this process is the identification of a target computing architecture for the optimization.…”
Section: Introductionmentioning
confidence: 99%
“…We use them to narrow the solution space for the optimization and parallelization of QMCPACK. Our previous work [8] showed performance improvement in 3D B-spline routines using a SoA data type. In this work, we implement the SoA data types in the full QMCPACK code for the top kernels.…”
Section: Related Workmentioning
confidence: 99%
“…Our previous work [8] demonstrated that tiling of the big Bspline table and parallel execution over the array-of-SoA (AoSoA) objects can reduce the time to complete a QMC step. We propose to extend those ideas to full QMCPACK.…”
Section: Outlook and Future Workmentioning
confidence: 99%