2021
DOI: 10.1007/978-3-030-78713-4_11
|View full text |Cite
|
Sign up to set email alerts
|

Proctor: A Semi-Supervised Performance Anomaly Diagnosis Framework for Production HPC Systems

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 9 publications
(5 citation statements)
references
References 34 publications
0
5
0
Order By: Relevance
“…As this is the first attempt at employing FL for this task, we decided to focus on a specific approach for anomaly detection in HPC, namely the semi-supervised method, as it has demonstrated very good results when applied to real HPC while not requiring fully annotated data (which is not always available in the HPC context); in future works, we plan to explore the impact of FL to other underlying ML models for anomaly detection. We start with a key assumption (the same described in previous semi-supervised approaches for HPC anomaly detection, see [5], [10]), that is we suppose that the training data will contain only examples corresponding to normal operating condition; this is a safe assumption as it is typically possible to identify relatively long healthy periods right after the installation of a new supercomputer before the HW components start to degrade. To allow for a fair comparison (FL usage against no FL) we opted to use as baseline ML model the same semisupervised deep neural network (DNN) employed in [10] 3 , namely an autoencoder network.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…As this is the first attempt at employing FL for this task, we decided to focus on a specific approach for anomaly detection in HPC, namely the semi-supervised method, as it has demonstrated very good results when applied to real HPC while not requiring fully annotated data (which is not always available in the HPC context); in future works, we plan to explore the impact of FL to other underlying ML models for anomaly detection. We start with a key assumption (the same described in previous semi-supervised approaches for HPC anomaly detection, see [5], [10]), that is we suppose that the training data will contain only examples corresponding to normal operating condition; this is a safe assumption as it is typically possible to identify relatively long healthy periods right after the installation of a new supercomputer before the HW components start to degrade. To allow for a fair comparison (FL usage against no FL) we opted to use as baseline ML model the same semisupervised deep neural network (DNN) employed in [10] 3 , namely an autoencoder network.…”
Section: Methodsmentioning
confidence: 99%
“…Most data-driven approaches drew inspiration from Machine Learning (ML) and Deep Learning (DL) domain [5], including supervised approaches that assume labeled training data [6] and unsupervised ones [7]. Semi-supervised models have been demonstrated to be very good at merging the strengths from both areas' weaknesses [8].…”
Section: Introductionmentioning
confidence: 99%
“…The XE were provided in the Digital Imaging and Communications in Medicine (DICOM) format. All XE were transformed from DICOM-format to NPY, which is readable for CNN architectures [21]. After the transformation, it was important to ensure that all the images were appropriately resized.…”
Section: Dataset Preparationmentioning
confidence: 99%
“…All mentioned approaches do not take into account temporal dependencies of data (models are not trained on time series but on tabular data containing no temporal information). System monitoring data approach [47] is the first to take into account temporal dependencies in data by calculating statistical features on temporal dimension (aggregation, sliding window statistics, lag features). Most approaches that deal with time series anomaly detection do so on system log data.…”
Section: Related Workmentioning
confidence: 99%