Proceedings of International Symposium on Grids &Amp; Clouds 2019 — PoS(ISGC2019) 2019
DOI: 10.22323/1.351.0003
|View full text |Cite
|
Sign up to set email alerts
|

Towards Predictive Maintenance with Machine Learning at the INFN-CNAF computing centre

Abstract: The INFN-CNAF computing center, one of the Worldwide LHC Computing Grid Tier-1 sites, is serving a large set of scientific communities, in High Energy Physics and beyond. In order to increase efficiency and to remain competitive in the long run, CNAF is launching various activities aiming at implementing a global predictive maintenance solution for the site. This requires a site-wide effort in collecting, cleaning and structuring all possibly useful data coming from log files of the various Tier-1 services and… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
8
0
1

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
3
1

Relationship

4
5

Authors

Journals

citations
Cited by 12 publications
(10 citation statements)
references
References 7 publications
0
8
0
1
Order By: Relevance
“…Currently, both supervised and unsupervised approaches are used [18], as well as anomaly detection and log template extraction approaches [19], and various techniques adopted from Natural Language Processing (NLP) research. All methods are based on a quick prototype and validation cycle, in order to timely develop, train, and test models against new unseen log data being collected in real time, with the goal of validating one or more approaches as valuable to the early detection of symptoms of future failures at sites.…”
Section: Site Operationsmentioning
confidence: 99%
“…Currently, both supervised and unsupervised approaches are used [18], as well as anomaly detection and log template extraction approaches [19], and various techniques adopted from Natural Language Processing (NLP) research. All methods are based on a quick prototype and validation cycle, in order to timely develop, train, and test models against new unseen log data being collected in real time, with the goal of validating one or more approaches as valuable to the early detection of symptoms of future failures at sites.…”
Section: Site Operationsmentioning
confidence: 99%
“…Um primeiro trabalho baseado em Elastic Stack Suite cataloga os registros de log e as anomalias usando uma ferramenta de aprendizado de máquina não-supervisionado (Diotalevi et al, 2019). Outra iniciativa usa abordagens supervisionadas de inteligência computacional para prever anomalias no comportamento do sistema em uma solução ad-hoc (Giommi et al, 2019). Em (Minarini, 2019) foi criado um protótipo para diferenciar comportamentos usuais e anômalos do sistema via máquinas de vetor de suporte.…”
Section: Contribuiçõesà Manutenção Preditivaunclassified
“…Many efforts have been done to maintain the quality of service (QoS) of the WLCG. In particular, at the Tier-1 computing center at Bologna, supervised machine learning methods to predict anomalies of the StoRM service [13] have been considered by using ad-hoc data processing methods. Another initiative concerns a system based on the Elastic Stack Suite to collect, parse and catalogue log data, as well as classifying anomalies using an embedded unsupervisedlearning tool [14].…”
Section: The Data Storage Servicementioning
confidence: 99%