SUMMARYAs the number of nodes in high-performance computing (HPC) systems increases, parallel I/O becomes an important issue: collective I/O is the specialized parallel I/O that provides the function of singlefile based parallel I/O. Collective I/O in most message passing interface (MPI) libraries follows a two-phase I/O scheme in which the particular processes, namely I/O aggregators, perform important roles by engaging the communications and I/O operations. This approach, however, is based on a single-core architecture. Because modern HPC systems use multi-core computational nodes, the roles of I/O aggregators need to be re-evaluated. Although there have been many previous studies that have focused on the improvement of the performance of collective I/O, it is difficult to locate a study regarding the assignment scheme for I/O aggregators that considers multi-core architectures. In this research, it was discovered that the communication costs in collective I/O differed according to the placement of the I/O aggregators, where each node had multiple I/O aggregators. The performance with the two processor affinity rules was measured and the results demonstrated that the distributed affinity rule used to locate the I/O aggregators in different sockets was appropriate for collective I/O. Because there may be some applications that cannot use the distributed affinity rule, the collective I/O scheme was modified in order to guarantee the appropriate placement of the I/O aggregators for the accumulated affinity rule. The performance of the proposed scheme was examined using two Linux cluster systems, and the results demonstrated that the performance improvements were more clearly evident when the computational node of a given cluster system had a complicated architecture. Under the accumulated affinity rule, the performance improvements between the proposed scheme and the original MPI-IO were up to approximately 26.25% for the read operation and up to approximately 31.27% for the write operation. key words: collective I/O, parallel I/O, processor affinity
IntroductionAs the size of a problem increases, many scientific applications generate a large number of file-I/O operations. Today's parallel programming paradigms provide some I/O methods for scientific applications, and previous studies [2]-[4] demonstrated the importance of single-file based parallel I/O, especially collective I/O.Collective I/O in message-passing interface (MPI) follows the two-phase I/O scheme that consists of an I/O phase and a data exchange phase [5]. In the two-phase I/O, the specialized process called I/O aggregator is engaged in the both phases. In other words, because the role of I/O aggregator is to collect or distribute I/O data to other clients, collective I/O performance can be affected by the ability of the I/O aggregator. In this study, we describe the effect of processor affinity in collective I/O considering multi-core cluster systems. Especially, we explain the relationship between the placement of I/O aggregators in each node a...