2021 IEEE/ACM International Workshop on Cloud Intelligence (CloudIntelligence) 2021
DOI: 10.1109/cloudintelligence52565.2021.00011
|View full text |Cite
|
Sign up to set email alerts
|

Learning Dependencies in Distributed Cloud Applications to Identify and Localize Anomalies

Abstract: Operation and maintenance of large distributed cloud applications can quickly become unmanageably complex, putting human operators under immense stress when problems occur. Utilizing machine learning for identification and localization of anomalies in such systems supports human experts and enables fast mitigation. However, due to the various interdependencies of system components, anomalies do not only affect their origin but propagate through the distributed system. Taking this into account, we present Arval… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2

Relationship

1
5

Authors

Journals

citations
Cited by 10 publications
(3 citation statements)
references
References 20 publications
0
3
0
Order By: Relevance
“…Seer [23] trained deep learning algorithms on massive amounts of data to identify root causes, but its performance may degrade with system updates. Scheinert proposed Arvalus and its improved algorithm D-Arvalus [24]. In the two algorithms, the system components are regarded as microservices, and the dependencies between components are regarded as connections, to identify the root cause in a graph.…”
Section: Related Workmentioning
confidence: 99%
“…Seer [23] trained deep learning algorithms on massive amounts of data to identify root causes, but its performance may degrade with system updates. Scheinert proposed Arvalus and its improved algorithm D-Arvalus [24]. In the two algorithms, the system components are regarded as microservices, and the dependencies between components are regarded as connections, to identify the root cause in a graph.…”
Section: Related Workmentioning
confidence: 99%
“…When collected over time, metric data can provide an abstract representation of the state of each system component. As in our previous work [27], we define metric data as multivariate time series, i.e. a temporally ordered sequence of vectors S = (S t ∈ R d : t = 1, 2, .…”
Section: A Preliminariesmentioning
confidence: 99%
“…Anomalous traces are detected if their STVs do not follow the distribution. Scheinert et al 50 present a neural graph method to detect and localize anomalies. It models the components in the distributed cloud application as nodes and their dependencies as edges.…”
Section: Related Workmentioning
confidence: 99%