Peter Georg scite author profile

Peter Georg

5Publications

55Citation Statements Received

110Citation Statements Given

How they've been cited

How they cite others

108

Affiliations

University of Regensburg

Publications

Order By: Most citations

DD-αAMG on QPACE 3

Georg¹,

Richtmann²,

Wettig³

2018

EPJ Web Conf.

View full text Add to dashboard Cite

We describe our experience porting the Regensburg implementation of the DD-αAMG solver from QPACE 2 to QPACE 3. We first review how the code was ported from the first generation Intel Xeon Phi processor (Knights Corner) to its successor (Knights Landing). We then describe the modifications in the communication library necessitated by the switch from InfiniBand to Omni-Path. Finally, we present the performance of the code on a single processor as well as the scaling on many nodes, where in both cases the speedup factor is close to the theoretical expectations. IntroductionThe lattice QCD (LQCD) community has traditionally been an early adopter of new computing and network architectures. This typically requires major efforts porting simulation code or even communication libraries. The Regensburg lattice group (RQCD) has been involved in such efforts, as well as supercomputer development, for more than a decade. While the first computer in the QPACE series [1,2] was based on IBM's Cell processor and an FPGA-based custom interconnect, the subsequent machines are using Intel's Xeon Phi series with standard interconnects (see Sec. 2.1). To satisfy the increasing demands of the RQCD physics program we use a state-of-the-art method, DD-αAMG [3], to solve the discretized form of the Dirac equation. The high-performance implementation of this solver on QPACE 2 is described in [4][5][6][7]. The present contribution focuses on the software efforts we made to efficiently run this implementation on QPACE 3. This paper is structured as follows. In Sec. 2 we give an overview of QPACE 3 and highlight the differences to QPACE 2 in terms of processor and network. We discuss the network technology in some detail because it has changed rather drastically. In Sec. 3 we describe how our solver and our communication library were adapted to the new technologies. In Sec. 4 we present single-node and multi-node benchmarks of the solver on QPACE 3 and compare the results with numbers obtained on QPACE 2. In Sec. 5 we conclude and give an outlook on future work. QPACE 3 2.1 OverviewWhile QPACE 2 [8] is based on the Knights Corner (KNC) version of the Intel Xeon Phi processor series and an FDR InfiniBand network, its successor QPACE 3 utilizes the current Xeon Phi processor, Speaker,

show abstract

pMR: A high-performance communication library

Georg¹,

Richtmann²,

Wettig³

2017

View full text Add to dashboard Cite

On many parallel machines, the time LQCD applications spent in communication is a significant contribution to the total wall-clock time, especially in the strong-scaling limit. We present a novel high-performance communication library that can be used as a de facto drop-in replacement for MPI in existing software. Its lightweight nature that avoids some of the unnecessary overhead introduced by MPI allows us to improve the communication performance of applications without any algorithmic or complicated implementation changes. As a first real-world benchmark, we make use of the pMR library in the coarse-grid solve of the Regensburg implementation of the DD-αAMG algorithm. On realistic lattices, we see an improvement of a factor 2x in pure communication time and total execution time savings of up to 20%. 34th annual International Symposium on Lattice Field Theory

show abstract

Adaptive algebraic multigrid on SIMD architectures

Heybrock¹,

Rottmann²,

Georg³

et al. 2016

View full text Add to dashboard Cite

We present details of our implementation of the Wuppertal adaptive algebraic multigrid code DD-αAMG on SIMD architectures, with particular emphasis on the Intel Xeon Phi processor (KNC) used in QPACE 2. As a smoother, the algorithm uses a domain-decomposition-based solver code previously developed for the KNC in Regensburg. We optimized the remaining parts of the multigrid code and conclude that it is a very good target for SIMD architectures. Some of the remaining bottlenecks can be eliminated by vectorizing over multiple test vectors in the setup, which is discussed in the contribution of Daniel Richtmann.

show abstract

SVE-Enabling Lattice QCD Codes

Meyer

Georg

Pleiter

et al. 2018

View full text Add to dashboard Cite

Optimization of applications for supercomputers of the highest performance class requires parallelization at multiple levels using different techniques. In this contribution we focus on parallelization of particle physics simulations through vector instructions. With the advent of the Scalable Vector Extension (SVE) ISA, future ARM-based processors are expected to provide a significant level of parallelism at this level.

show abstract

Low-rank tensor methods for Markov chains with applications to tumor progression models

et al. 2022

View full text Add to dashboard Cite

Cancer progression can be described by continuous-time Markov chains whose state space grows exponentially in the number of somatic mutations. The age of a tumor at diagnosis is typically unknown. Therefore, the quantity of interest is the time-marginal distribution over all possible genotypes of tumors, defined as the transient distribution integrated over an exponentially distributed observation time. It can be obtained as the solution of a large linear system. However, the sheer size of this system renders classical solvers infeasible. We consider Markov chains whose transition rates are separable functions, allowing for an efficient low-rank tensor representation of the linear system’s operator. Thus we can reduce the computational complexity from exponential to linear. We derive a convergent iterative method using low-rank formats whose result satisfies the normalization constraint of a distribution. We also perform numerical experiments illustrating that the marginal distribution is well approximated with low rank.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Peter Georg

DD-αAMG on QPACE 3

pMR: A high-performance communication library

Adaptive algebraic multigrid on SIMD architectures

SVE-Enabling Lattice QCD Codes

Low-rank tensor methods for Markov chains with applications to tumor progression models

Contact Info

Product

Resources

About