High Performance Dependable Multiprocessor II

Ramos, J.; Samson, J.R.; Lupia, D.; Troxel, Ian A.; Subramaniyan, Rajagopal; Jacobs, Adam; Greco, J.; Cieslewski, Grzegorz; Curreri, John; Fischer, Mike; Grobelny, Eric; George, Alan; Aggarwal, Vikas A.; Patel, Minesh; Some, Raphael

doi:10.1109/aero.2007.353106

Cited by 28 publications

(5 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…There are more complex systems involving COTS processors, such as [14] and NASA's Dependable Multiprocessor [15,16], where a cluster of COTS processing nodes is controlled by a reliable computer. In these systems the configurations are not precalculated, but rather every job is scheduled online to a currently available node.…”

Section: Configuration Managementmentioning

confidence: 99%

Model-Based Reconfiguration Planning for a Distributed On-board Computer

Kovalov

Franz

Watolla

et al. 2020

Proceedings of the 12th System Analysis and Modelling Conference

View full text Add to dashboard Cite

The ScOSA project (Scalable On-board Computing for Space Avionics) of the German Aerospace Center aims at combining radiation hardened space hardware together with unreliable, but high performance COTS (commercial off-the-shelf) components as the processing nodes in a heterogeneous on-board network in order to provide future space missions with the necessary processing capabilities. However, such a system needs to cope with node failures. Our approach is to use a static reconfiguration graph that controls how software tasks are mapped to the processing nodes, and how this mapping should change in response to possible node failures.In this paper we present a model-based approach and a tool for automatic generation of reconfiguration graphs. Based on the software and hardware models, we traverse the graph of all possible failure situations. For every node of this graph we solve a combinatorial optimization problem of mapping tasks to processing nodes either with an SMT solver or using a genetic algorithm. The resulting reconfiguration graph can then be translated into the configuration files that are deployed on the target system, eliminating the need for tedious and error-prone manual configuration design. CCS CONCEPTS• Software and its engineering → Model-driven software engineering; System modeling languages; • Mathematics of computing → Combinatorial optimization.

show abstract

Section: Configuration Managementmentioning

confidence: 99%

Model-Based Reconfiguration Planning for a Distributed On-board Computer

Kovalov

Franz

Watolla

et al. 2020

Proceedings of the 12th System Analysis and Modelling Conference

View full text Add to dashboard Cite

show abstract

“…The Remote Exploration and Experimentation project conducted at NASA was among the first to consider putting a COTS‐based parallel machine into space and address the resulting problems related to application‐adaptive fault tolerance . More recently, NASA's Millennium ST‐8 project developed a ‘ Dependable Multiprocessor ’ around a COTS‐based cluster using the IBM PowerPC 750FX (IBM, Somers, NY, USA) as a data processor, with a Xilinx VirtexII 6000 FPGA coprocessor (Xilinx, San Jose, CA, USA) for the support of application‐specific modules for digital signal processing, data compression, and vector processing. A centralized system controller for the cluster is implemented using a redundant configuration of radiation‐hardened Motorola processors.…”

Section: Related Workmentioning

confidence: 99%

Fault‐tolerant on‐board computing for robotic space missions

Zima

James

Springer

2011

Concurrency and Computation

View full text Add to dashboard Cite

This paper describes an approach to providing software fault tolerance for future deep-space robotic National Aeronautics and Space Administration missions, which will require a high degree of autonomy supported by an enhanced on-board computational capability. We focus on introspection-based adaptive fault tolerance guided by the specific requirements of applications. Introspection supports monitoring of the program execution with the goal of identifying, locating, and analyzing errors. Fault tolerance assertions for the introspection system can be provided by the user, domain-specific knowledge, or via the results of static or dynamic program analysis. This work is part of an on-going project at the Jet Propulsion Laboratory in Pasadena, California. 2193 systems control the new generation of fly-by-wire aircraft, such as the Airbus and Boeing airliners. Most space missions of the past were largely controlled from Earth, so that a significant number of failures could be handled by putting the spacecraft in a 'safe' mode, with Earth-bound controllers attempting to return it to operational mode. This approach will no longer work for future robotic deep-space missions, which will require enhanced autonomy and a powerful on-board computational capability. Such missions are becoming possible as a result of recent advances in microprocessor technology, which are leading to low-power many-core chips that today already have on the order of 100 cores. These developments imply a range of consequences for fault tolerance, some of them challenging and others providing new opportunities. In this paper, we focus on an approach for software-implemented application-adaptive fault tolerance, which is made possible by the enhanced multithreading capability of modern hardware. This paper is an extended and modified version of a paper presented at the Euro-Par 2010 conference [2]. It is structured as follows: In Section 2, we establish a conceptual basis, providing more precise definitions for the notions of dependability and fault tolerance. Section 3 gives an overview of future missions and their requirements. After outlining the global structure of our approach in Section 4, we take a closer look at the introspection framework and its structure (Section 5). Adaptive fault tolerance is discussed in Section 6. The paper ends with an overview of related work and concluding remarks in Sections 7 and 8. FAULT TOLERANCE IN THE CONTEXT OF DEPENDABILITY Methods for fault detection and recoveryIntrospection-based fault tolerance provides a flexible approach that in addition to applying innovative methods can leverage existing technology. Methods that are useful in this context include assertion-based acceptance tests that check the value of an assertion and transfer control to the IFT

show abstract

“…DM and DM CubeSat technology development has been regularly documented in open literature [1], [2], [3], [4], [5], [6], [7], [8], [9]. A brief overview of DM and DM CubeSat technology is provided in this introductory section to provide a basis for what is discussed in the remainder of the paper.…”

Section: Introduction -Dmidm Cubesat Backgroundmentioning

confidence: 99%

“…The DM TRL6 technology validation effort included the demonstration of low overhead and ease-of-use for MPI (Message Passing Interface )-based parallel applications. Summaries of the TRL6 technology validation effort can be found in [5], [6], [7], [8], and [9]. A comprehensive discussion of DM technology development from TRL4 through TRL6 is provided in [6].…”

mentioning

confidence: 99%

Small, light-weight, low-power, low-cost, high performance computing for CubeSats

Samson

2014

2014 IEEE Aerospace Conference

View full text Add to dashboard Cite

Dependable Multiprocessor (DM) CubeSat technology continued its steady path to flight with the development and demonstration of a DM CubeSat F-cubed (Form, Fit, and Function) payload processor flight prototype. The DM F-cubed flight prototype was developed and demonstrated as part of the SMDC TechSat Phase 2 effort. The F-cubed flight prototype was designed to be a reusable, scalable implementation that can be used in a variety of future CubeSat missions and can fit into a variety of CubeSat form factors including SMDC TechSat, SMDC-ONE, Pumpkin Sat, and other 3U CubeSat configurations. The paper includes a brief overview of DM and DM CubeSat technology development, the path from NMP ST8 to SMDC TechSat, and the SMDC TechSat Phase 1 effort, but focuses on the development of the F-cubed flight prototype and its use in the DM portion of the SMDC TechSat Phase 2 Demo.

show abstract

High Performance Dependable Multiprocessor II

Cited by 28 publications

References 17 publications

Model-Based Reconfiguration Planning for a Distributed On-board Computer

Model-Based Reconfiguration Planning for a Distributed On-board Computer

Fault‐tolerant on‐board computing for robotic space missions

Small, light-weight, low-power, low-cost, high performance computing for CubeSats

Contact Info

Product

Resources

About