Machine Learning Techniques for Fault Tolerance Management

Gururaj, H L; Flammini, Francesco; Swathi, B H; Nagaraj, Nandini; Ramesh, Sunil Kumar Byalaru

doi:10.1201/9781003319917-7

Computational Intelligence for Cybersecurity Management and Applications 2023

DOI: 10.1201/9781003319917-7

|View full text |Cite

Machine Learning Techniques for Fault Tolerance Management

H L Gururaj¹,

Francesco Flammini²,

B H Swathi³

et al.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

Supporting

Mentioning

Contrasting

Year Published

2024

Publication Types

Select...

Article1

Relationship

Self Cite0

Independent1

Authors

Journals

Cited by 1 publication

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

A Proactive Approach to Fault Tolerance Using Predictive Machine Learning Models in Distributed Systems

Haroon,

Siddiqui,

Husain

et al. 2024

IJERR

View full text Add to dashboard Cite

In the era of cloud computing and large-scale distributed systems, ensuring uninterrupted service and operational reliability is crucial. Conventional fault tolerance techniques usually take a reactive approach, addressing problems only after they arise. This can result in performance deterioration and downtime. With predictive machine learning models, this research offers a proactive approach to fault tolerance for distributed systems, preventing significant failures before they arise. Our research focuses on combining cutting-edge machine learning algorithms with real-time analysis of massive streams of operational data to predict abnormalities in the system and possible breakdowns. We employ supervised learning algorithms such as Random Forests and Gradient Boosting to predict faults with high accuracy. The predictive models are trained on historical data, capturing intricate patterns and correlations that precede system faults. Early defect detection made possible by this proactive approach enables preventative remedial measures to be taken, reducing downtime and preserving system integrity. To validate our approach, we designed and implemented a fault prediction framework within a simulated distributed system environment that mirrors contemporary cloud architectures. Our experiments demonstrate that the predictive models can successfully forecast a wide range of faults, from hardware failures to network disruptions, with significant lead time, providing a critical window for implementing preventive measures. Additionally, we assessed the impact of these pre-emptive actions on overall system performance, highlighting improved reliability and a reduction in mean time to recovery (MTTR). We also analyse the scalability and adaptability of our proposed solution within diverse and dynamic distributed environments. Through seamless integration with existing monitoring and management tools, our framework significantly enhances fault tolerance capabilities without requiring extensive restructuring of current systems. This work introduces a proactive approach to fault tolerance in distributed systems using predictive machine learning models. Unlike traditional reactive methods that respond to failures after they occur, this work focuses on anticipating faults before they happen.

show abstract

A Proactive Approach to Fault Tolerance Using Predictive Machine Learning Models in Distributed Systems

Haroon,

Siddiqui,

Husain

et al. 2024

IJERR

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Machine Learning Techniques for Fault Tolerance Management

Cited by 1 publication

References 0 publications

A Proactive Approach to Fault Tolerance Using Predictive Machine Learning Models in Distributed Systems

A Proactive Approach to Fault Tolerance Using Predictive Machine Learning Models in Distributed Systems

Contact Info

Product

Resources

About