2021
DOI: 10.1007/978-3-030-85665-6_5
|View full text |Cite
|
Sign up to set email alerts
|

E2EWatch: An End-to-End Anomaly Diagnosis Framework for Production HPC Systems

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3

Citation Types

0
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(3 citation statements)
references
References 32 publications
0
3
0
Order By: Relevance
“…This data can then be used in a supervised learning task directly or after processing new features (feature construction). Examples of this approach are [45,17,46] where authors use supervised ML approaches to classify the performance variations and joblevel faults in HPC systems. For fault detection, [8,18] propose a supervised approach based on Random Forest (an ensemble method based on decision trees) to classify faults in an HPC system.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…This data can then be used in a supervised learning task directly or after processing new features (feature construction). Examples of this approach are [45,17,46] where authors use supervised ML approaches to classify the performance variations and joblevel faults in HPC systems. For fault detection, [8,18] propose a supervised approach based on Random Forest (an ensemble method based on decision trees) to classify faults in an HPC system.…”
Section: Related Workmentioning
confidence: 99%
“…Tabular data Time series Supervised [49,9] [47, 48, 10] Semi-supervised [5,6,43,22] Unsupervised [19,20] [21] The novelty of this paper is, in relation to the existing works, threefold:…”
Section: Related Workmentioning
confidence: 99%
“…Since anomalies in HPC systems are rare events, the problem of anomaly detection cannot be treated as a classical supervised learning problem [17,21]; the majority of works that treat it in a fully supervised fashion have been tested using synthetic [14,22] or injected anomalies [15]. Instead of learning the properties of both relevant classes, the standard approach is to learn just the properties of the system's normal operation -anything deviating from this normal operation is then recognized as an anomaly.…”
Section: Related Workmentioning
confidence: 99%