Design, Automation &Amp; Test in Europe Conference &Amp; Exhibition (DATE), 2017 2017
DOI: 10.23919/date.2017.7927143
|View full text |Cite
|
Sign up to set email alerts
|

Continuous learning of HPC infrastructure models using big data analytics and in-memory processing tools

Abstract: Exascale computing represents the next leap in the HPC race. Reaching this level of performance is subject to several engineering challenges such as energy consumption, equipmentcooling, reliability and massive parallelism. Model-based optimization is an essential tool in the design process and control of energy efficient, reliable and thermally constrained systems. However, in the Exascale domain, model learning techniques tailored to the specific supercomputer require real measurements and must therefore han… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
41
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
5
3
1

Relationship

5
4

Authors

Journals

citations
Cited by 48 publications
(41 citation statements)
references
References 12 publications
0
41
0
Order By: Relevance
“…These agents monitor the power consumption of each node at the plug as well as performance and utilization metrics, using both software commands and hardware sensors. The measured values are sent to a data management backbone (namely ExaMon [BBCB17]), through a communication layer based on the opensource MQTT (MQ Telemetry Transport) protocol [Sta14], which is designed for low bandwidth, high latency networks and minimal resource demands.…”
Section: Target Supercomputer and Monitoring Frameworkmentioning
confidence: 99%
See 1 more Smart Citation
“…These agents monitor the power consumption of each node at the plug as well as performance and utilization metrics, using both software commands and hardware sensors. The measured values are sent to a data management backbone (namely ExaMon [BBCB17]), through a communication layer based on the opensource MQTT (MQ Telemetry Transport) protocol [Sta14], which is designed for low bandwidth, high latency networks and minimal resource demands.…”
Section: Target Supercomputer and Monitoring Frameworkmentioning
confidence: 99%
“…The integrated monitoring infrastructure periodically reads a set of metrics and collects them into a single gathering point. The authors of [BBCB17] show that these sensors can easily reach 1.5KSa/s per compute node, and propose Examon, a scalable infrastructure based on local monitoring agents pushing data through the MQTT protocol. Clearly, local software-based monitoring agents compete for the same computational resources of users' applications.…”
Section: Introductionmentioning
confidence: 99%
“…The data collection infrastructure deployed in D.A.V.I.D.E. is called Examon and has been presented in previous works [BBCea17,BBLea18]. Examon is a fine-grained, lightweight and scalable monitoring infrastructure for Exascale supercomputers.…”
Section: Data Collectionmentioning
confidence: 99%
“…These elements include the node and rack cooling components as well as environmental parameters such as the room and ambient temperature. In ANTAREX, we developed ExaMon [26] (Exascale Monitoring) to virtualise the performance and power monitoring access in a distributed environment. ExaMon decouples the sensor readings from the sensor value usage.…”
Section: Monitoringmentioning
confidence: 99%