SUMMARYIn the early 1990s, researchers at Sandia National Laboratories and the University of New Mexico began development of customized system software for massively parallel 'capability' computing platforms. These lightweight kernels have proven to be essential for delivering the full power of the underlying hardware to applications. This claim is underscored by the success of several supercomputers, including the Intel Paragon, Intel Accelerated Strategic Computing Initiative Red, and the Cray XT series of systems, each having established a new standard for high-performance computing upon introduction. In this paper, we describe our approach to lightweight compute node kernel design and discuss the design principles that have guided several generations of implementation and deployment. A broad strategy of operating system specialization has led to a focus on user-level resource management, deterministic behavior, and scalable system services. The relative importance of each of these areas has changed over the years in response to changes in applications and hardware and system architecture. We detail our approach and the associated principles, describe how our application of these principles has changed over time, and provide design and performance comparisons to contemporaneous supercomputing operating systems.
This report presents a specification for the Portals 4.0 network programming interface. Portals 4.0 is intended to allow scalable, high-performance network communication between nodes of a parallel computing system. Portals 4.0 is well suited to massively parallel processing and embedded systems. Portals 4.0 represents an adaption of the data movement layer developed for massively parallel processing platforms, such as the 4500-node Intel TeraFLOPS machine. Sandia's Cplant cluster project motivated the development of Version 3.0, which was later extended to People who were influential in managing the project were: Bill Camp, Ed Barsis, Art Hale, and Neil Pundit While we have tried to be comprehensive in our listing of the people involved, it is very likely that we have missed at least one important contributor. The omission is a reflection of our poor memories and not a reflection of the importance of their contributions. We apologize to the unnamed contributors.4
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.