Proceedings of the 29th International Symposium on High-Performance Parallel and Distributed Computing 2020
DOI: 10.1145/3369583.3392674
|View full text |Cite
|
Sign up to set email alerts
|

DCDB Wintermute: Enabling Online and Holistic Operational Data Analytics on HPC Systems

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
15
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 22 publications
(17 citation statements)
references
References 34 publications
1
15
0
Order By: Relevance
“…However, this system does not consider rootcause analysis of the performance reduction. The authors of [5], [39] do not only provide the feature of real-time monitoring, but are also able to identify the performance issues and trouble-shoot the cause of the issues. In addition to them, [40] uses a type of artificial neural network called autoencoder for anomaly detection.…”
Section: Related Workmentioning
confidence: 99%
“…However, this system does not consider rootcause analysis of the performance reduction. The authors of [5], [39] do not only provide the feature of real-time monitoring, but are also able to identify the performance issues and trouble-shoot the cause of the issues. In addition to them, [40] uses a type of artificial neural network called autoencoder for anomaly detection.…”
Section: Related Workmentioning
confidence: 99%
“…Based on an extensive survey of experimental techniques proposed in the literature [3, 13, 18, 20, 21, 23, 26, 30, 34, 35, 40-42, 45, 47] and in light of our long-term experiences at LRZ [36][37][38], we derive a generic formulation for ODA -namely, we identify the main functional steps composing it. These are summarized in Figure 1: the first step consists in Monitoring of system resources by collecting sensor or log data; this is followed by Monitoring Data Processing, transforming the raw monitoring data into a polished representation that can be comprehended by ODA techniques -aggregation and dimensionality reduction, for example, are two common approaches to solve this task.…”
Section: State Of the Art And System Designmentioning
confidence: 99%
“…A wide variety of software solutions falling under our definition of ODA are available in the literature, aiming to cover the different aspects of an HPC system's operation [37]. The vast majority of these techniques have never been employed in a production context, and thus their practical applicability is not proven: on the other hand, a small subset of tools are indeed known to have been used in production environments, but are either tailored for individual use cases or lack any report of long-term operation.…”
Section: State Of the Artmentioning
confidence: 99%
See 2 more Smart Citations