Hobbes

Brightwell, Ron; Oldfield, Ron; Maccabe, A.B.; Bernholdt, David E.

doi:10.1145/2491661.2481427

Cited by 39 publications

(9 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…There are two key distinguishing elements to the project: (i) enclaves and (ii) composition. An enclave is a partitioned region given to a particular application or service [2,4]. Enclaves house applications and therefore will be composed to form more complex application instances [4].…”

Section: Discussionmentioning

confidence: 99%

“…As high-performance computing (HPC) systems increase in size and complexity, the associated system software faces new challenges to balance performance, usability and robustness. The use of virtualization in HPC has gained attention in recent years [4,9,13,15,20,21,23,24,6], mainly for enabling isolation, customization and resilience abilities. The benefit of having a user-customized execution environment is one advantage [5,21,23].…”

Section: Introductionmentioning

confidence: 99%

“…The benefit of having a user-customized execution environment is one advantage [5,21,23]. Also, the ability to provide increased functionality without having to require this in all instances is another use case [4,20]. For example, microkernels have been used on several supercomputers to achieve minimal system-level interference [20], i.e., "the OS should stay out of the way".…”

Section: Introductionmentioning

confidence: 99%

“…The HPC community recently introduced the concept of enclave as an operating and runtime system design characteristic for addressing current scalability, resilience and performance limitations at extreme scale. For instance, the Hobbes project, which aims at designing operating system/runtime (OS/R) interfaces for extreme-scale systems [4], defines an enclave as "a partition of the system allocated to a single application or service" and has proposed a design based on system-level virtualization. Figure 1 shows a diagram of the proposed Hobbes software architecture.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

What Is the Right Balance for Performance and Isolation with Virtualization in HPC?

Naughton

Smith

Engelmann

et al. 2014

Lecture Notes in Computer Science

View full text Add to dashboard Cite

The use of virtualization in high-performance computing (HPC) has been suggested as a means to provide tailored services and added functionality that many users expect from full-featured Linux cluster environments. While the use of virtual machines in HPC can offer several benefits, maintaining performance is a crucial factor. In some instances performance criteria are placed above isolation properties and selective relaxation of isolation for performance is an important characteristic when considering resilience for HPC environments employing virtualization.In this paper we consider some of the factors associated with balancing performance and isolation in configurations that employ virtual machines. In this context, we propose a classification of errors based on the concept of "error zones", as well as a detailed analysis of the trade-offs between resilience and performance based on the level of isolation provided by virtualization solutions. Finally, the results from a set of experiments are presented, that use different virtualization solutions, and in doing so allow further elucidation of the topic.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

What Is the Right Balance for Performance and Isolation with Virtualization in HPC?

Naughton

Smith

Engelmann

et al. 2014

Lecture Notes in Computer Science

View full text Add to dashboard Cite

show abstract

“…Operating system and run-time research funded by the Exascale Computing Project (ECP) and ASCR (Argo (Perarnau et al, 2013) and Hobbes (Brightwell et al, 2013)) investigates system support for unconventional HPC programming models, support for multiple concurrent runtimes, and advanced virtualization capabilities that could be leveraged to support desired ISDM capabilities. However, as show, HPC platforms still do not support all the capabilities needed for in situ workflows.…”

Section: Computational Platformsmentioning

confidence: 99%

Priority research directions for in situ data management: Enabling scientific discovery from diverse data sources

Peterka

Bard

Bennett

et al. 2020

The International Journal of High Performance Computing Applica

View full text Add to dashboard Cite

In January 2019, the US Department of Energy, Office of Science program in Advanced Scientific Computing Research, convened a workshop to identify priority research directions (PRDs) for in situ data management (ISDM). A fundamental finding of this workshop is that the methodologies used to manage data among a variety of tasks in situ can be used to facilitate scientific discovery from many different data sources—simulation, experiment, and sensors, for example—and that being able to do so at numerous computing scales will benefit real-time decision-making, design optimization, and data-driven scientific discovery. This article describes six PRDs identified by the workshop, which highlight the components and capabilities needed for ISDM to be successful for a wide variety of applications—making ISDM capabilities more pervasive, controllable, composable, and transparent, with a focus on greater coordination with the software stack and a diversity of fundamentally new data algorithms.

show abstract

An evaluation of the state of time synchronization on leadership class supercomputers

Jones

Ostrouchov

Koenig

et al. 2017

Concurrency and Computation

View full text Add to dashboard Cite

SummaryWe present a detailed examination of time agreement characteristics for nodes within extreme-scale parallel computers. Using a software tool we introduce in this paper, we quantify attributes of clock skew among nodes in three representative high-performance computers sited at three national laboratories. Our measurements detail the statistical properties of time agreement among nodes and how time agreement drifts over typical application execution durations.We discuss the implications of our measurements, why the current state of the field is inadequate, and propose strategies to address observed shortcomings. KEYWORDSclock synchronization, large-scale systems, system software, time service INTRODUCTIONThe trend towards increasing node counts in high-performance computing (HPC) is motivating a move toward greater levels of concurrency in HPC systems. Today's software environment is now being called on to produce new solutions for emerging issues including managing system power, resilience, and performance characteristics. The distributed algorithms that underlie such services operate much more efficiently in the presence of tightly synchronized clocks. For example, tightly synchronized clocks benefit well-known gang scheduling techniques and complex consensus algorithms. To illustrate the point, such time synchronization enables more aggressive assumptions about communication and synchronization patterns, the removal of unnecessary locks, and a wide range of other applications. Clock-based techniques are already frequently deployed in cloud and data center distributed systems for precisely these reasons.We examined the time synchronization on some of the world's fastest and most powerful machines. These leadership-class systems employ high-end hardware connected by an extremely low-latency, low-jitter, interconnect in a carefully controlled environment, in contrast to widely distributed cloud-based systems based on commodity hardware and networks. Because of this, we assumed that these systems would have more stable, predictable hardware clocks, and close base time agreement using only standard time synchronization systems like Network Time Protocol (NTP). We did not believe that the complex hardware and software techniques used to provide time synchronization in wide-area systems would be necessary in leadership systems.Our results demonstrate that the actual time uncertainty for leadership-class machines is often unexpectedly large, in some cases over 600 milliseconds despite network latencies of less than two microseconds. Building on this, we set out to thoroughly quantify the magnitude of the time synchronization challenge in leadership-class systems. This study shows that the current time protocol in use, NTP, is not suitable for providing the level of time synchronization necessary for important system software tasks such as coordinated scheduling. Based on this, we conclude

show abstract

Hobbes

Cited by 39 publications

References 30 publications

What Is the Right Balance for Performance and Isolation with Virtualization in HPC?

What Is the Right Balance for Performance and Isolation with Virtualization in HPC?

Priority research directions for in situ data management: Enabling scientific discovery from diverse data sources

An evaluation of the state of time synchronization on leadership class supercomputers

Contact Info

Product

Resources

About