2018 IEEE 11th International Conference on Software Testing, Verification and Validation (ICST) 2018
DOI: 10.1109/icst.2018.00034
|View full text |Cite
|
Sign up to set email alerts
|

Localizing Faults in Cloud Systems

Abstract: By leveraging large clusters of commodity hardware, the Cloud offers great opportunities to optimize the operative costs of software systems, but impacts significantly on the reliability of software applications. The lack of control of applications over Cloud execution environments largely limits the applicability of state-of-the-art approaches that address reliability issues by relying on heavyweight training with injected faults.In this paper, we propose LOUD, a lightweight fault localization approach that r… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
63
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
4
4
2

Relationship

0
10

Authors

Journals

citations
Cited by 58 publications
(63 citation statements)
references
References 39 publications
0
63
0
Order By: Relevance
“…Beschastnikh et al [40] discuss the key features and debugging challenges of distributed systems and present a visualization tool named ShiViz. Leonardo et al [43] introduce a lightweight fault localization approach for cloud systems; it can localize faults with high precision, by relying on only lightweight positive training. In contrast to the preceding previous work, our work is the first to conduct delta debugging for microservice systems.…”
Section: Related Workmentioning
confidence: 99%
“…Beschastnikh et al [40] discuss the key features and debugging challenges of distributed systems and present a visualization tool named ShiViz. Leonardo et al [43] introduce a lightweight fault localization approach for cloud systems; it can localize faults with high precision, by relying on only lightweight positive training. In contrast to the preceding previous work, our work is the first to conduct delta debugging for microservice systems.…”
Section: Related Workmentioning
confidence: 99%
“…Once the operation level degrades below a pre-set threshold, various restoration procedures must be carried out until the desired level of operation is achieved. Large-scale cloud systems might require additional steps in order to pinpoint the exact location of a fault [107].…”
Section: A Formal Resilience Orchestrationmentioning
confidence: 99%
“…Besides, Tuncer et al [ 35 ] proposed a new framework for detecting anomalies in HPC systems by clustering statistical features that retained application characteristics from the time series. On another hand, Mariani et al [ 37 ] proposed a new approach named LOUD that associated machine learning with graph centrality algorithms. LOUD analyzed KPIs metrics collected from the running systems using machine learning lightweight positive training.…”
Section: Related Workmentioning
confidence: 99%