2020
DOI: 10.1051/epjconf/202024503017
|View full text |Cite
|
Sign up to set email alerts
|

Operational Intelligence for Distributed Computing Systems for Exascale Science

Abstract: In the near future, large scientific collaborations will face unprecedented computing challenges. Processing and storing exabyte datasets require a federated infrastructure of distributed computing resources. The current systems have proven to be mature and capable of meeting the experiment goals, by allowing timely delivery of scientific results. However, a substantial amount of interventions from software developers, shifters and operational teams is needed to efficiently manage such heterogeneous infrastruc… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
2
0
1

Year Published

2020
2020
2022
2022

Publication Types

Select...
3
2

Relationship

3
2

Authors

Journals

citations
Cited by 5 publications
(3 citation statements)
references
References 4 publications
0
2
0
1
Order By: Relevance
“…For this reason, several communities involved in the Worldwide LHC Computing Grid have started a project named Operational Intelligence 2 that aims at increasing the level of automation in computing operations, thus reducing human interventions. As a result of the joint effort, several strategies have already been proposed to support operational workflows in various ways [16][17][18]26]. Some works address anomaly detection by leveraging overall workloads-e.g.…”
Section: Related Workmentioning
confidence: 99%
“…For this reason, several communities involved in the Worldwide LHC Computing Grid have started a project named Operational Intelligence 2 that aims at increasing the level of automation in computing operations, thus reducing human interventions. As a result of the joint effort, several strategies have already been proposed to support operational workflows in various ways [16][17][18]26]. Some works address anomaly detection by leveraging overall workloads-e.g.…”
Section: Related Workmentioning
confidence: 99%
“…Como o Tier-1é uma infraestrutura dedicada aos experimentos de física (Di Girolamo et al, 2020),é necessário otimizar os recursos usados para manter a operacionalidade do sistema. Para tal, uma possível abordagemé identificar quais trechos de log têm prioridade de processamento, baseado em uma maior probabilidade de encontrar informaçõesúteis para a manutenção do sistema.…”
Section: Preliminaresunclassified
“…In addition, user logs are noticed as service-oriented unstructured data. Large volumes of data are produced by a number of system logs, which makes the implementation of a general-purpose log-based predictive maintenance solution challenging [10]. Logging activity means the rate of lines written in a log file.…”
Section: Introductionmentioning
confidence: 99%