N-Modular Redundancy (NMR) and N-Version Programming (NVP) are two popular fault tolerance techniques in which hardware and software redundancy is exploited to mask faults. Redundant hardware is used to improve fault tolerance rather than throughput. We introduce a scheme for combined hardware-software fault tolerance derived from NMR and NVP that shows how redundancy can also be used to improve throughput by grouping the execution of several tasks. Our scheme uses a dynamic task allocation algorithm with an optimistic execution policy where the number of task executions is kept close to the minimum required to produce fault-jree results. For equivalent hardware and software resources,the proposed method is 50% to 100% more efJicient in terms of throughput and latency.
Increasingly, services such as E-commerce, web hosting, application hosting, etc., are being deployed over an infrastructure that spans multiple control domains. These end-to-end services require cooperation and internetworking between multiple organizations, systems and entities. Currently, there are no standard mechanisms to share selective management information between the various service providers or between service providers and their customers. Such mechanisms are necessary for end-to-end service management and diagnosis as well as for ensuring the service level obligations between a service provider and its customers or partners. In this paper we describe an architecture that uses contracts based on service level agreements (SLAs) to share selective management information across administrative boundaries. We also describe the design of a prototype implementation of this architecture that has been used by us for automatically measuring, monitoring, and verifying service level agreements for Internet services.
The explosive growth of the Internet, widespread use of the World Wide Web, and a trend towards deployment of broadband residential networks are stimulating the development of new services such as interactive shopping, horne banking, and electronic commerce. These services are federated since they depend on an infrastructure that spans multiple independent control domains. Managing federated services and providing effective support to the customer of these services is difficult, because only a small part of the environment can be observed and controlled by any given authority. We characterize different dimensions of this problem, using our experience with the deployment of a system that gives the horne consumer broadband access to community content as well as to the Internet. This type of system is referred to as Broadband Interactive Data Services or BIDS. We then focus on diagnosis and describe a customer support tool that was developed to partially automate diagnosis in BIDS. We use the experience with this tool to derive a blueprint for a gen~ral architecture for managing federated services. The architecture is based on service contracts between control domains.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.