Facing the challenges of the next generation exascale computing, National University of Defense Technology has developed a prototype system to explore opportunities, solutions, and limits toward the next generation Tianhe system. This paper briefly introduces the prototype system, which is deployed at the National Supercomputer Center in Tianjin and has a theoretical peak performance of 3.15 Pflops. A total of 512 compute nodes are found where each node has three proprietary CPUs called Matrix-2000+. The system memory is 98.3 TB, and the storage is 1.4 PB in total.
The performance gap for high performance applications has been widening over time. High level program transformations are critical to improve applications’ performance, many of which concern the determination of optimal values for transformation parameters, such as loop unrolling and blocking. Static approaches achieve these values based on analytical models that are hard to achieve because of increasing architecture complexity and code structures. Recent iterative compilation approaches achieve it by executing different versions of the program on actual platforms and select the one that renders best performance, outperforming static compilation approaches significantly. But the expensive compilation cost has limited their application scope to embedded applications and a small group of math kernels. This paper proposes a combinative approach--Combining Model and Iterative Compilation for Program Performance Optimization (CMIC). Such an approach first constructs a program optimization transformation model based on hardware performance counters to decide how and when to apply transformations, and then selects the optimal transformation parameters using Nelder-Mead simplex algorithm. Experimental results show that our approach can effectively improve programs’ floating-point performance, reducing programs’ runtime, therefore, lessening the performance gap for high-performance applications.
The performance gap for high performance applications has been widening over time. High level program transformations are critical to improve the applications' performance, many of which concern the determination of optimal values for transformation parameters, such as loop unrolling and blocking. Traditional compilers select these parameters based on static analytical models. However, complex computer architectures and code behaviors greatly limit the strength of optimizing compilers. Iterative compilation approach determines these parameter values by executing the program with different parameter values and selects the one with the shortest runtime, outperforming static compilation approaches significantly, which makes it a hot research topic in the high performance computing research community. But it's quite time consuming because of the huge optimization space. Therefore, an effective search strategy is crucial for iterative compilation. This paper investigates the Nelder-Mead simplex algorithm for iterative compilation optimization parameter search. Experimental results indicate Nelder-Mead simplex based search strategy can produce parameter values with better performance and lower cost.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.