We describe a prototype tool that analyzes combined aspects of reliability and performance for a variety of networks. Our network-wide analysis systematically generates failure scenarios (called network states), maps their effects from the physical to the network to the traffic layer, calculates the probability of each state, and evaluates a metric on it, until the expected value of the metric or a point on its distribution is estimated to sufficient accuracy.We describe an application of this multi-layer probabilistic methodology to dealing with partial link failures in an ISP backbone network.
I. INTRODUCTIONAs communications networks grow in size and complexity, the evaluation of their reliability and performance becomes more and more critical. Network service providers usually guarantee levels of service in terms of down time, restoration delay, latency, etc., to their enterprise customers. These service-level agreements (SLAs), are stated as, for example, "with 99.99% probability, at most 5% of the network traffic will be unavailable".To determine whether a network meets such guarantees, it is important to have a methodology that can model and evaluate the impact of failures on both reliability and performance. Reliability and performance are often analyzed separately, but then the results either fail to capture aspects of performance, or depict performance in an ideal state. The notion of combining aspects of both reliability and performance is embodied in the concept of performability analysis ([Mey95], [Col99]). Given that detailed reliability or performance analyses are difficult problems in their own right, performability analysis of necessity involves some simplification of both the reliability and performance aspects in order to be feasible.Our view is that network performability analysis must involve the following essential ingredients:• Network-wide analysis, instead of e.g., studying the behavior of "reference" connections. • Hierarchical, multi-level network models, to capture the way failures really propagate through network layers. • Associating probabilities with failures, and with performance guarantees, instead of "deterministic" analyses. • Systematic exploration of the space of all possible failures, as opposed to only single failures, or what-if studies, for example. For this purpose we have developed nperf, a network performability analyzer. nperf examines both the reliability