We describe our experience porting the Regensburg implementation of the DD-αAMG solver from QPACE 2 to QPACE 3. We first review how the code was ported from the first generation Intel Xeon Phi processor (Knights Corner) to its successor (Knights Landing). We then describe the modifications in the communication library necessitated by the switch from InfiniBand to Omni-Path. Finally, we present the performance of the code on a single processor as well as the scaling on many nodes, where in both cases the speedup factor is close to the theoretical expectations.
IntroductionThe lattice QCD (LQCD) community has traditionally been an early adopter of new computing and network architectures. This typically requires major efforts porting simulation code or even communication libraries. The Regensburg lattice group (RQCD) has been involved in such efforts, as well as supercomputer development, for more than a decade. While the first computer in the QPACE series [1,2] was based on IBM's Cell processor and an FPGA-based custom interconnect, the subsequent machines are using Intel's Xeon Phi series with standard interconnects (see Sec. 2.1). To satisfy the increasing demands of the RQCD physics program we use a state-of-the-art method, DD-αAMG [3], to solve the discretized form of the Dirac equation. The high-performance implementation of this solver on QPACE 2 is described in [4][5][6][7]. The present contribution focuses on the software efforts we made to efficiently run this implementation on QPACE 3. This paper is structured as follows. In Sec. 2 we give an overview of QPACE 3 and highlight the differences to QPACE 2 in terms of processor and network. We discuss the network technology in some detail because it has changed rather drastically. In Sec. 3 we describe how our solver and our communication library were adapted to the new technologies. In Sec. 4 we present single-node and multi-node benchmarks of the solver on QPACE 3 and compare the results with numbers obtained on QPACE 2. In Sec. 5 we conclude and give an outlook on future work.
QPACE 3 2.1 OverviewWhile QPACE 2 [8] is based on the Knights Corner (KNC) version of the Intel Xeon Phi processor series and an FDR InfiniBand network, its successor QPACE 3 utilizes the current Xeon Phi processor, Speaker,