2022
DOI: 10.1002/cpe.7202
|View full text |Cite
|
Sign up to set email alerts
|

PreF: Predicting job failure on supercomputers with job path and user behavior

Abstract: Large numbers of jobs are executed on supercomputers almost every day. Unfortunately, many jobs would fail for various reasons, resulting in the waste of resources and the prolonged waiting time for queuing jobs. Job failure prediction can guide adjustment measures in advance, which is vital to the system's overall execution efficiency and reliability. Aiming at the problem that the existing job failure prediction methods are single, the collection of job features is complex and challenging to apply. This arti… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
2

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 18 publications
0
2
0
Order By: Relevance
“…In the studies by Banjongkan et al 19 and Yoo et al, 20 although they employ tree structure models, their algorithms are single, and they do not consider the correlation of job application sequences as addressed in this article, leading to weaker predictive performance. Furthermore, compared to our previous research work, 21,22 the FP-JSC framework has achieved a promising prediction effect.…”
Section: Comparison With Other Methodsmentioning
confidence: 74%
See 1 more Smart Citation
“…In the studies by Banjongkan et al 19 and Yoo et al, 20 although they employ tree structure models, their algorithms are single, and they do not consider the correlation of job application sequences as addressed in this article, leading to weaker predictive performance. Furthermore, compared to our previous research work, 21,22 the FP-JSC framework has achieved a promising prediction effect.…”
Section: Comparison With Other Methodsmentioning
confidence: 74%
“…In this article, we are attempting to construct a machine learning model with rapid computing efficiency, robust interpretability, and no prior knowledge assumptions. In our previous research, 21,22 it was demonstrated that tree structure models' learning algorithms are better suited for our HPC system. Hence, we have opted for the following three learning algorithms.…”
Section: Tree Structure Algorithmmentioning
confidence: 98%