Proceedings of the Sixth International Workshop on Programming Models and Applications for Multicores and Manycores 2015
DOI: 10.1145/2712386.2712391
|View full text |Cite
|
Sign up to set email alerts
|

Thread-level parallelization and optimization of NWChem for the Intel MIC architecture

Abstract: In the multicore era it was possible to exploit the increase in on-chip parallelism by simply running multiple MPI processes per chip. Unfortunately, manycore processors' greatly increased thread-and data-level parallelism coupled with a reduced memory capacity demand an altogether different approach. In this paper we explore augmenting two NWChem modules, triples correction of the CCSD(T) and Fock matrix construction, with OpenMP in order that they might run efficiently on future manycore architectures. As th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
7
0

Year Published

2016
2016
2021
2021

Publication Types

Select...
6
3

Relationship

0
9

Authors

Journals

citations
Cited by 10 publications
(7 citation statements)
references
References 19 publications
0
7
0
Order By: Relevance
“…The approach was tested in the FMO and Community Earth System Model (CESM) packages. Shan et al (2014Shan et al ( , 2015 used OpenMP task parallelism to HF SCF and CCSD(T) drivers.…”
Section: Related Workmentioning
confidence: 99%
“…The approach was tested in the FMO and Community Earth System Model (CESM) packages. Shan et al (2014Shan et al ( , 2015 used OpenMP task parallelism to HF SCF and CCSD(T) drivers.…”
Section: Related Workmentioning
confidence: 99%
“…While in some cases, the addition of OpenMP threads improves performance, neither NWChem nor the Global Arrays toolkit are completely thread safe. The use of OpenMP with NWChem CCSD(T) calculations was shown to improve performance, but extensive changes were required in every routine accessed during the calculations, and even variables within nested loops had to be updated.…”
Section: Introductionmentioning
confidence: 99%
“…Moreover, in [34], the authors discussed the optimization of NWChem for Intel's MIC architecture and highlighted the need for tensor computations of about 200-2,000 matrices from 10 × 10 to 40 × 40 in size. In his dissertation, David Ozog discussed NWChem's Tensor Contraction Engine (TCE) and revealed how strongly it relies on the performance of general matrix-matrix multiplication (GEMM) in the computation of the tensor contraction.…”
Section: Introductionmentioning
confidence: 99%