2018
DOI: 10.1051/epjconf/201817502007
|View full text |Cite
|
Sign up to set email alerts
|

DD-αAMG on QPACE 3

Abstract: We describe our experience porting the Regensburg implementation of the DD-αAMG solver from QPACE 2 to QPACE 3. We first review how the code was ported from the first generation Intel Xeon Phi processor (Knights Corner) to its successor (Knights Landing). We then describe the modifications in the communication library necessitated by the switch from InfiniBand to Omni-Path. Finally, we present the performance of the code on a single processor as well as the scaling on many nodes, where in both cases the speedu… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
21
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
6
2

Relationship

2
6

Authors

Journals

citations
Cited by 21 publications
(21 citation statements)
references
References 10 publications
(19 reference statements)
0
21
0
Order By: Relevance
“…We used a modified version of the Chroma [104] software package, along with the LibHadronAnalysis library and, depending on the target machine, either the multigrid DD-αAMG solver [105] implementation of refs. [106,107] or the domain decomposition solver of openQCD [55] (https://luscher.web.cern.ch/luscher/openQCD/). Most gauge ensembles have been generated by CLS (https://wiki-zeuthen.desy.de/CLS/) using openQCD.…”
Section: Discussionmentioning
confidence: 99%
“…We used a modified version of the Chroma [104] software package, along with the LibHadronAnalysis library and, depending on the target machine, either the multigrid DD-αAMG solver [105] implementation of refs. [106,107] or the domain decomposition solver of openQCD [55] (https://luscher.web.cern.ch/luscher/openQCD/). Most gauge ensembles have been generated by CLS (https://wiki-zeuthen.desy.de/CLS/) using openQCD.…”
Section: Discussionmentioning
confidence: 99%
“…Furthermore, we linked R with the Intel Math Kernel Library for threaded and vectorized matrix operations. We ran the algorithm on 25 nodes of our QPACE 3 machine Georg et al (2017) with 8 MPI tasks per node and 32 hardware threads per task, where each thread can use two AVX512 vector units. In 16 hours, 5086 iterations were finished, after which the loss (3) was stable to within 1%.…”
Section: High-performance Computing-empowered Loss-function Learning mentioning
confidence: 99%
“…A more cost-and work-efficient alternative to single-cell assays is a combination of bulk tissue gene expression profiling with digital tissue deconvolution (DTD) (Lu et al, 2003;Abbas et al, 2009;Gong et al, 2011;Qiao et al, 2012;Altboum et al, 2014;Newman et al, 2015;Li et al, 2016). DTD addresses the following inverse problem: given the bulk gene expression profile y of a tissue, what is the cellular composition c of that tissue?…”
Section: Introductionmentioning
confidence: 99%
“…An in-depth evaluation of the Intel Omni-Path network for LQCD applications and An implementation of the DD-α AMG multigrid solver on Intel Knights Landing: P. Georg and D. Richtmann presented the porting of their collaboration codebase onto the new QPACE3 machine, based on KNL nodes [24]. More specifically they presented their efforts in porting of the DD-αAMG solver.…”
Section: Contributionsmentioning
confidence: 99%