Gang Xian scite author profile

Large numbers of jobs are executed on supercomputers almost every day. Unfortunately, many jobs would fail for various reasons, resulting in the waste of resources and the prolonged waiting time for queuing jobs. Job failure prediction can guide adjustment measures in advance, which is vital to the system's overall execution efficiency and reliability. Aiming at the problem that the existing job failure prediction methods are single, the collection of job features is complex and challenging to apply. This article strives to study whether these failed jobs can be predicted with known and synthetic features. We perform a comprehensive analysis of large amounts of historical data and various features and find that two novel features (running path and retry count) can predict job failure well. The running path indicates the application type a job belongs to, and the retry count reflects the user's behavior when the job fails. We propose a job failure prediction framework called PreF on supercomputers using machine learning based on the novel features. The experimental results show that PreF can correctly identify over 89% of jobs, outperforming the latest related methods on the comprehensive evaluation indicator (S_score) by around 4%.

show abstract

Influence of cluster correlation on nanoclusters in Fe-Ni amorphous alloys

Liang

Xian

Zhou

et al. 2022

Journal of Alloys and Compounds

View full text Add to dashboard Cite

Visual Analysis of the High-performance Computing Jobs Based on the Comprehensive Load Scoring Algorithm

Xian¹,

Tang²,

Yang³

et al. 2021

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Gang Xian

A Study of Job Failure Prediction on Supercomputers with Application Semantic Enhancement

PreF: Predicting job failure on supercomputers with job path and user behavior

Influence of cluster correlation on nanoclusters in Fe-Ni amorphous alloys

Visual Analysis of the High-performance Computing Jobs Based on the Comprehensive Load Scoring Algorithm

Contact Info

Product

Resources

About