Her research interests include Self-Regulatory Learning, learning analytics, blended learning, mobile learning and computer-supported collaborative learning. Nyi Nyi Htun is a postdoctoral researcher at the department of Computer Science at KU Leuven. His research interests include interactive recommender and information retrieval systems, human-computer interaction, digital health and business information systems. Martijn Millecamp is a PhD student at Augment, the Human-Computer Interaction group of the Department of Computer Science, KU Leuven. His research interest is user modelling for explainable recommender system interfaces and dashboards for learning analytics.
The growing complexity of scientific applications has led to the design and deployment of large-scale parallel systems. The IBM Blue Gene/L can hold in excess of 200K processors and it has been designed for high performance and reliability. However, failures in this large-scale parallel system are a major concern, since it has been demonstrated that a failure will significantly reduce the performance of the system.Although reactive fault tolerant policies effectively minimize the effects of faults, it has been shown that these techniques drastically reduce the system performance. Proactive fault tolerant policies have emerged as an alternative due to the reduced performance degradation they impose.Proactive fault tolerant policies are based on the analysis of information about the state of the system. The monitoring system of the IBM Blue Gene/L generates online information about the state of hardware and software of the system and stores that information in the RAS event log.In this study, we design and implement a module prototype for online failure prediction. This prototype is tested and validated, on a realistic scenario, using the RAS event log of an IBM Blue Gene/L system. We show that our module prototype for failure prediction predicts up to 70% of the fatal events.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.