2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) 2017
DOI: 10.1109/ccgrid.2017.18
|View full text |Cite
|
Sign up to set email alerts
|

LOGAIDER: A Tool for Mining Potential Correlations of HPC Log Events

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
21
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
3
3
2

Relationship

1
7

Authors

Journals

citations
Cited by 43 publications
(21 citation statements)
references
References 14 publications
0
21
0
Order By: Relevance
“…It can not be subsumed by temporal characteristics such as MTBF or temporal recurrence. Interestingly, researchers at LLNL and Argonne National Laboratory have independently veriied that our observations about spatial distribution of failures hold true in their systems as well [6,30], which are not Cray systems and have very diferent system composition.…”
Section: Discussionmentioning
confidence: 62%
See 2 more Smart Citations
“…It can not be subsumed by temporal characteristics such as MTBF or temporal recurrence. Interestingly, researchers at LLNL and Argonne National Laboratory have independently veriied that our observations about spatial distribution of failures hold true in their systems as well [6,30], which are not Cray systems and have very diferent system composition.…”
Section: Discussionmentioning
confidence: 62%
“…The development of such analysis methodologies, tools and predictive models can lead to deeper insights about the underlying system behavior but it remains too complex to allow system administrators and users to compare reliability of two diferent systems. This work has introduced two new metrics to characterize temporal and spatial properties of failures that can be used by other failure analysis eforts [2,6,30].…”
Section: Related Work and Conclusionmentioning
confidence: 99%
See 1 more Smart Citation
“… Balliu et al (2015) propose B I DA L , a tool to characterize the workload of cloud infrastructures, They use log data from Google data clusters for evaluation and incorporate support to popular analysis languages and storage backends on their tool. Di et al (2017) propose L OG A IDER , a tool that integrates log mining and visualization to analyze different types of correlation (e.g., spatial and temporal). In this study, they use log data from Mira, an IBM Blue Gene-based supercomputer for scientific computing, and reported high accuracy and precision in uncovering correlations associated with failures.…”
Section: Resultsmentioning
confidence: 99%
“…In the Mira cluster with diverse system logs, the Reliability, Availability, and Serviceability (RAS) log is our focus. The [12].…”
Section: Blue Gene/q Mira Cluster and Ras Logsmentioning
confidence: 99%