Abstract. This paper introduces the new Open Trace Format. The first part provides a small overview about Trace Format Libraries in general and existing Formats/Libraries and their features. After that the important requirements are discussed. In particular it concerns efficient parallel and selective access to trace data. The following part presents design decisions and features of OTF comprehensively. Finally, there is some early evaluation of OTF. It features comparison of storage size for several examples as well as sequential and parallel I/O benchmarks. At the end, a conclusion will summarize the results and give some outlook.
Abstract. Performance optimization remains one of the key issues in parallel computing. Many parallel applications do not benefit from the continually increasing peak performance of todays massively parallel computers, mainly because they have not been designed to operate efficiently on the 1000s of processors of todays top of the range systems. Conventional performance analysis is typically restricted to accumulated data on such large systems, severely limiting its use when dealing with real-world performance bottlenecks. Event based performance analysis can give the detailed insight required, but has to deal with extreme amounts of data, severely limiting its scalability. In this paper, we present an approach for scalable event-driven performance analysis that combines proven tool technology with novel concepts for hierarchical data layout and visualization. This evolutionary approach is being validated by implementing extensions to the performance analysis tool Vampir.
Memory and I/O performance bottlenecks in supercomputing simulations are two key challenges that must be addressed on the road to Exascale. The new byte-addressable persistent non-volatile memory technology from Intel, DCPMM, promises to be an exciting opportunity to break with the status quo, with unprecedented levels of capacity at near-DRAM speeds. Here, we explore the potential of DCPMM in the context of two high-performance scientific applications in terms of outright performance, efficiency and usability for both its Memory and App Direct modes. In Memory mode, we show equivalent performance and better efficiency for a CASTEP simulation that is limited by memory capacity on conventional DRAM-only systems without any changes to the application. For IFS, we demonstrate that a distributed object-store over NVRAM reduces the data contention created in weather forecasting data producer-consumer workflows. In addition, we also present the achievable memory bandwidth performance using STREAM.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.