Proceedings of the ACM SIGCOMM 2010 Conference 2010
DOI: 10.1145/1851182.1851220
|View full text |Cite
|
Sign up to set email alerts
|

California fault lines

Abstract: Of the major factors affecting end-to-end service availability, network component failure is perhaps the least well understood. How often do failures occur, how long do they last, what are their causes, and how do they impact customers? Traditionally, answering questions such as these has required dedicated (and often expensive) instrumentation broadly deployed across a network.We propose an alternative approach: opportunistically mining "low-quality" data sources that are already available in modern network e… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2011
2011
2023
2023

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 107 publications
(5 citation statements)
references
References 30 publications
0
5
0
Order By: Relevance
“…Fig. 1 shows the distribution of failure impact (i.e., maximum link utilization (MLU) increase) under OSPF 2 and optimal (MCF) routing scheme for three real-world large scale network topologies with more than 100 nodes. It turns out that only 0.19%, 0.03%, and 3.43% failure scenarios on Ion, Interoute, and DialtelecomCz, respectively, under the optimal routing scheme, cause significant impact (i.e., more than 80% of worst-case failure impact) to the network availability.…”
Section: Motivation Of Fernmentioning
confidence: 99%
See 1 more Smart Citation
“…Fig. 1 shows the distribution of failure impact (i.e., maximum link utilization (MLU) increase) under OSPF 2 and optimal (MCF) routing scheme for three real-world large scale network topologies with more than 100 nodes. It turns out that only 0.19%, 0.03%, and 3.43% failure scenarios on Ion, Interoute, and DialtelecomCz, respectively, under the optimal routing scheme, cause significant impact (i.e., more than 80% of worst-case failure impact) to the network availability.…”
Section: Motivation Of Fernmentioning
confidence: 99%
“…As the scale and complexity of modern networks continue to increase rapidly, the occurrence of failures has become a common and frequent event in both wide area networks (WANs) [1], [2], [3], [4], [5], [6] and data center networks (DCNs) [7]. Recently, increasing research efforts have considered tackling the problem from different aspects.…”
Section: Introductionmentioning
confidence: 99%
“…In view of that, in Algorithm 7, whenever a valid notification is received, the procedure will also verify whether the notification was received within a reasonably short amount of time of the previously detected local failure (Lines 7-8). According to measurement studies (GOVINDAN et al, 2016;TURNER et al, 2012;GILL;JAIN;NAGAPPAN, 2011;TURNER et al, 2010;MARKOPOULOU et al, 2008), the majority of times, when two links fail simultaneously, they belong to the same shared-risk group. With that observation in mind, whenever a switch is aware of two single-link failures happening in a short period in time, it will transition to a tactic that handles the shared-risk group to which the two links belong by looking up table SF-TACTICS (L.9-11).…”
Section: Switch and Other Shared-risk Multi-link Failuresmentioning
confidence: 99%
“…Consequently, computing alternative forwarding entries for all possible failure scenarios would both take an impractically long time and require prohibitive amounts of memory. Despite that, network measurement studies (GOVINDAN et al, 2016;TURNER et al, 2012;GILL;JAIN;NAGAPPAN, 2011;TURNER et al, 2010;MARKOPOULOU et al, 2008) show that although failures happen frequently (a few minutes apart), it is very unusual for two elements (e.g., links, switches, optical-fiber cable) to fail at the "same time", unless they belong to the same shared-risk group. For example, two links connected to the same switch are perceived as failed whenever the switch itself fails.…”
Section: Many Failure Scenariosmentioning
confidence: 99%
“…There already exists interesting literature on the empirical characteristics of failures, e.g., in datacenters [12], [31], statewide networks [29], or IP backbones [16]. This literature is highly valuable for the comparison of existing networks, but does not directly solve the problem of comparing network designs that are not yet implemented.…”
Section: Related Workmentioning
confidence: 99%