Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles 2021
DOI: 10.1145/3477132.3483577
|View full text |Cite
|
Sign up to set email alerts
|

Understanding and Detecting Software Upgrade Failures in Distributed Systems

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
6

Relationship

0
6

Authors

Journals

citations
Cited by 23 publications
(1 citation statement)
references
References 33 publications
0
1
0
Order By: Relevance
“…Prior work in this space has focused on two main directions. First, there has been several empirical studies on analyzing incidents and outages in production systems which have focused on studying incidents caused by certain type of issues [48]- [51] or issues from specific services and systems [52]- [54]. Second and more related to our work is the use of machine learning and data driven techniques for automating different aspects of incident lifecycle such as triaging [55], [56], diagnosis [57]- [59] and mitigation [5].…”
Section: A Incident Managementmentioning
confidence: 99%
“…Prior work in this space has focused on two main directions. First, there has been several empirical studies on analyzing incidents and outages in production systems which have focused on studying incidents caused by certain type of issues [48]- [51] or issues from specific services and systems [52]- [54]. Second and more related to our work is the use of machine learning and data driven techniques for automating different aspects of incident lifecycle such as triaging [55], [56], diagnosis [57]- [59] and mitigation [5].…”
Section: A Incident Managementmentioning
confidence: 99%