2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS) 2021
DOI: 10.1109/ipdps49936.2021.00010
|View full text |Cite
|
Sign up to set email alerts
|

Correlation-wise Smoothing: Lightweight Knowledge Extraction for HPC Monitoring Data

Abstract: Modern High-Performance Computing (HPC) and data center operators rely more and more on data analytics techniques to improve the efficiency and reliability of their operations. They employ models that ingest time-series monitoring sensor data and transform it into actionable knowledge for system tuning: a process known as Operational Data Analytics (ODA). However, monitoring data has a high dimensionality, is hardware-dependent and difficult to interpret. This, coupled with the strict requirements of ODA, make… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
8
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
1

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(9 citation statements)
references
References 32 publications
1
8
0
Order By: Relevance
“…Further, the models are naturally biased since they are trained to predict the maximum temperature rather than the average, leading to a natural tendency to over-estimation: this is a deliberate choice in order to make the control pipeline more robust against over-heating. Finally, using the CS method gives us compact models that have a negligible impact on overhead and that are resistant against changes in the available set of sensors over time: after proving its validity through tailored experiments in previous work, this is the first real-life use case for this technique [38]. 4.5.2 Impact on Infrastructure.…”
Section: Operational Resultsmentioning
confidence: 84%
See 4 more Smart Citations
“…Further, the models are naturally biased since they are trained to predict the maximum temperature rather than the average, leading to a natural tendency to over-estimation: this is a deliberate choice in order to make the control pipeline more robust against over-heating. Finally, using the CS method gives us compact models that have a negligible impact on overhead and that are resistant against changes in the available set of sensors over time: after proving its validity through tailored experiments in previous work, this is the first real-life use case for this technique [38]. 4.5.2 Impact on Infrastructure.…”
Section: Operational Resultsmentioning
confidence: 84%
“…Based on an extensive survey of experimental techniques proposed in the literature [3, 13, 18, 20, 21, 23, 26, 30, 34, 35, 40-42, 45, 47] and in light of our long-term experiences at LRZ [36][37][38], we derive a generic formulation for ODA -namely, we identify the main functional steps composing it. These are summarized in Figure 1: the first step consists in Monitoring of system resources by collecting sensor or log data; this is followed by Monitoring Data Processing, transforming the raw monitoring data into a polished representation that can be comprehended by ODA techniques -aggregation and dimensionality reduction, for example, are two common approaches to solve this task.…”
Section: State Of the Art And System Designmentioning
confidence: 99%
See 3 more Smart Citations