Proceedings of the Workshop on Hot Topics in Operating Systems 2021
DOI: 10.1145/3458336.3465297
|View full text |Cite
|
Sign up to set email alerts
|

Cores that don't count

Abstract: We are accustomed to thinking of computers as fail-stop, especially the cores that execute instructions, and most system software implicitly relies on that assumption. During most of the VLSI era, processors that passed manufacturing tests and were operated within specifications have insulated us from this fiction. As fabrication pushes towards smaller feature sizes and more elaborate computational structures, and as increasingly specialized instruction-silicon pairings are introduced to improve performance, w… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
7
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 110 publications
(8 citation statements)
references
References 27 publications
1
7
0
Order By: Relevance
“…This will undoubtedly create new resilience issues in future systems. 4,5 For academia and/or researchers, the reaction is consistent with Hennessy and Patterson's description of a renaissance in computer architectures. 6 New horizons to explore!…”
Section: What Are the Implications For Hpc?supporting
confidence: 71%
“…This will undoubtedly create new resilience issues in future systems. 4,5 For academia and/or researchers, the reaction is consistent with Hennessy and Patterson's description of a renaissance in computer architectures. 6 New horizons to explore!…”
Section: What Are the Implications For Hpc?supporting
confidence: 71%
“…First, many vendors already provide mean time to failure (MTTF) information for their software or hardware components. Secondly, cloud providers host larger numbers of hardware components in their data centers, which are constantly monitored, providing significant amounts of data also for rare events [34]. Thirdly, for yet unobserved failures of highly available components, one can use rare event analysis (an active research area) in conjunction with expert knowledge acquisition to incorporate prior beliefs first and later refine the estimate with observation during mission time.…”
Section: Discussionmentioning
confidence: 99%
“…As cloud scale, complexity, and operational experience continued to grow, additional optimization and leverage opportunities emerged, including software defined networking [46], protocol offloads, and custom network architectures (greatly reducing dependence on traditional network hardware vendors) [47]; quantitative analysis of processor [48], memory [49,50], network [51,52] and disk failure modes [53,54], with consequent redesign for reliability and lower cost (dictating specifications to vendors via consortia like Open Compute [55]); custom processor SKUs, custom accelerators (FPGAs and ASICs), and finally, complete processor design (e.g., Apple silicon, Google TPUs [56] and AWS Gravitons). In between, the cloud vendors deployed their own global fiber networks.…”
Section: The Rise Of Cloud Servicesmentioning
confidence: 99%