Srikar Tati scite author profile

Full knowledge of the routing topology of the Internet is useful for a multitude of network management tasks. However, the full topology is often not known and is instead estimated using topology inference algorithms. Many of these algorithms use Traceroute to probe paths and then use the collected information to infer the topology. We perform real experiments and show that in practice routers may severely disrupt the operation of Traceroute and cause it to only provide partial information. We propose iTop, an algorithm for inferring the network topology when only partial information is available. iTop constructs a virtual topology, which overestimates the number of network components, and then repeatedly merges links in this topology to resolve it towards the structure of the true network. We perform extensive simulations to compare iTop to state of the art inference algorithms. Results show that iTop significantly outperforms previous approaches and its inferred topologies are within 5% of the original networks for all considered metrics. Additionally, we show that the topologies inferred by iTop significantly improve the performance of fault localization algorithms when compared to other approaches.

show abstract

Adaptive Algorithms for Diagnosing Large-Scale Failures in Computer Networks

Tati

Swami

et al. 2015

IEEE Trans. Parallel Distrib. Syst.

View full text Add to dashboard Cite

Abstract-In this paper, we propose an algorithm to efficiently diagnose large-scale clustered failures. The algorithm, Cluster-MAX-COVERAGE (CMC), is based on greedy approach. We address the challenge of determining faults with incomplete symptoms. CMC makes novel use of both positive and negative symptoms to output a hypothesis list with a low number of false negatives and false positives quickly. CMC requires reports from about half as many nodes as other existing algorithms to determine failures with 100% accuracy. Moreover, CMC accomplishes this gain significantly faster (sometimes by two orders of magnitude) than an algorithm that matches its accuracy. Furthermore, we propose an adaptive algorithm called Adaptive-MAX-COVERAGE (AMC) that performs efficiently during both kinds of failures, i.e., independent and clustered. During a series of failues that include both independent and clustered, AMC results in a reduced number of false negatives and false positives. [13] is proposed to localize largescale failures in networks. It is shown that by considering the failure patterns of large-scale outages, this algorithm can achieve higher accuracy than existing algorithms developed for independent failures [10]. However, the drawback of netCSI is that the run-time complexity of the algorithm increases exponentially with the increase in size of networks since it is a combinatorial approach. Keywords-FaultIn this paper, we propose a new algorithm called Cluster-MAX-COVERAGE (CMC) that diagnoses large-scale clustered failures. To identify the faulty network elements (i.e., network nodes, routers, and links) CMC utilizes a knowledge base of possible network paths and end-to-end symptom information. The observed end-to-end symptoms during failures include both negative symptoms, such as which source-destination pairs are disconnected, as well as positive symptoms, such as which source-destination pairs can still communicate. This information is reported to the network manager by a few selected nodes in the network called reporting nodes; a complete list of symptoms is not required. Using this information, CMC outputs a hypothesis list which consists of a set of network elements whose failures are consistent with the symptoms.To solve the issue of run-time complexity, CMC adopts a greedy approach when generating the hypothesis list of faulty network elements, as opposed to the combinatorial approach in netCSI. Our greedy approach is similar to a fault diagnosis algorithm called MAX-COVERAGE (MC) [10], which is developed to diagnose black holes or silent failures (independent failures) in IP networks. During clustered failures, the performance of MC degrades significantly-in particular it produces a prohibitively high number of false negatives (see Section V-C1). To overcome this limitation, CMC uses clusters of objects instead of single objects when forming the hypothesis list.The major contributions of CMC include:• Clustering models: To diagnose large-scale failures, CMC selects clusters of objects greedily based on c...

show abstract

netCSI: A Generic Fault Diagnosis Algorithm for Large-Scale Failures in Computer Networks

Tati

Rager

Ko³

et al. 2011

View full text Add to dashboard Cite

In this paper we present a framework and a set of algorithms for determining faults in networks when large scale outages occur. The design principles of our algorithm, netCSI, are motivated by the fact that failures are geographically clustered in such cases. We address the challenge of determining faults with incomplete symptom information due to a limited number of reporting nodes in the network. netCSI consists of two parts: hypotheses generation algorithm, and ranking algorithm. When constructing the hypotheses list of potential causes, we make novel use of the positive and negative symptoms to improve the precision of the results. The ranking algorithm is based on conditional failure probability models that account for the geographic correlation of the network objects in clustered failures. We evaluate the performance of netCSI for networks with both random and realistic topologies. We compare the performance of netCSI with an existing fault diagnosis algorithm, MAX-COVERAGE, and achieve an average gain of 128% in accuracy for realistic topologies.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Srikar Tati

Robust Network Tomography in the Presence of Failures

Network Coding aware Rate Selection in multi-rate IEEE 802.11

Network Topology Inference With Partial Information

Adaptive Algorithms for Diagnosing Large-Scale Failures in Computer Networks

netCSI: A Generic Fault Diagnosis Algorithm for Large-Scale Failures in Computer Networks

Contact Info

Product

Resources

About