Aggressive Fault Tolerance in Cloud Computing Using Smart Decision Agent

Rahman, Md. Mostafijur; Rouf, M. A.

doi:10.1007/978-981-16-6636-0_26

Cited by 5 publications

(4 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The second approach provides fault tolerance by integrating reactive and adaptive approaches. Rahman et al 144 • Performance degrades et al 152 presented an energy-aware proactive fault-tolerant scheduling mechanism for cloud computing based on Artificial Intelligence. The proposed framework consists of two steps, that is, predicting the probability of task failure based on a model trained using DNN and scheduling the predicted task to the most suitable hosts.…”

Section: Hybrid Fault Tolerance Approachesmentioning

confidence: 99%

“…The second approach provides fault tolerance by integrating reactive and adaptive approaches. Rahman et al 144 introduced an intelligent decision agent‐based technique for detection and recovery from faults in cloud systems. The given approach combines replication, checkpointing, and resubmission with a smart agent, which decides to select one of the fault tolerance approaches based on fault types.…”

Section: Taxonomy Of Fault Tolerance Approachesmentioning

confidence: 99%

See 1 more Smart Citation

Fault‐tolerance approaches for distributed and cloud computing environments: A systematic review, taxonomy and future directions

Kirti,

Maurya,

Yadav

2024

Concurrency and Computation

View full text Add to dashboard Cite

Fault tolerance is crucial in ensuring smooth working of distributed and cloud computing. It is challenging to implement because of the constantly changing infrastructure and complex configurations in distributed and cloud computing. Implementation of various fault tolerance methods require domain‐specific knowledge as well as in‐depth understanding of the existing techniques and approaches. Recent surveys on fault tolerance in cloud and distributed environments exist, but they have limitations. This article systematically reviews fault tolerance approaches in distributed and cloud computing and discusses their taxonomy. Based on the taxonomy provided, fault‐tolerance approaches are divided into four types, that is, reactive approaches, proactive approaches, adaptive approaches, and hybrid approaches. Reactive approaches provide a preventive measure after the occurrence of faults in the system. Proactive approaches prevent the system or minimize failure effects by predicting in advance. The adaptive approaches predict, learn, and adapt the changes to deal with new faults in the system. The hybrid approaches combine reactive, proactive, and adaptive approaches. The objective of this article is to give a better understanding of handling faults using suitable approaches and further compare them on various parameters. The paper also presents a promising research direction based on the challenges and issues in multiple approaches.

show abstract

Section: Hybrid Fault Tolerance Approachesmentioning

confidence: 99%

Section: Taxonomy Of Fault Tolerance Approachesmentioning

confidence: 99%

Fault‐tolerance approaches for distributed and cloud computing environments: A systematic review, taxonomy and future directions

Kirti,

Maurya,

Yadav

2024

Concurrency and Computation

View full text Add to dashboard Cite

show abstract

“…The scalability and computational overhead of the proposed approach have not been evaluated. To reduce the effect of transient failures, the authors of [26] proposed an aggressive fault tolerance approach in a cloud environment to detect and recover from failures. An intelligent decision agent is used by the aggressive fault detection and recovery module to detect and recover from faults.…”

Section: Related Workmentioning

confidence: 99%

A New Fault-Tolerant Algorithm Based on Replication and Preemptive Migration in Cloud Computing

Semmoud

Hakem

Benmammar

et al. 2022

International Journal of Cloud Applications and Computing

View full text Add to dashboard Cite

Cloud computing is a promising paradigm that provides users higher computation advantages in terms of cost, flexibility, and availability. Nevertheless, with potentially thousands of connected machines, faults become more frequent. Consequently, fault-tolerant load balancing becomes necessary in order to optimize resources utilization while ensuring the reliability of the system. Common fault tolerance techniques in cloud computing have been proposed in the literature. However, they suffer from several shortcomings: some fault tolerance techniques use checkpoint-recovery which increases the average waiting time and thus the mean response time. While other models rely on task replication which reduces the cloud's efficiency in terms of resource utilization under variable loads. To address these deficiencies, an efficient and adaptive fault tolerant algorithm for load balancing is proposed. Based on the CloudSim simulator, some series of test-bed scenarios are considered to assess the behavior of the proposed algorithm.

show abstract

“…• Fault Tolerance Parameters in Cloud Computing: Different parameters are used to assess the efficiency and effectiveness of fault tolerance approaches in cloud computing systems (Rahman & Rouf, 2022).These parameters, crucial for evaluating cloud system performance includes Adaptive, Response Time, Performance, Throughput, Reliability, Availability, Usability, Overhead. Table 1, provides an overview of common failures in computing systems, categorizing them into hardware, software, and network errors.…”

mentioning

confidence: 99%

Leveraging Distributed Systems for Fault-Tolerant Cloud Computing: A Review of Strategies and Frameworks

M. Almufti,

R. M. Zeebaree

2024

ACAD J NAWROZ UNIV

View full text Add to dashboard Cite

Ensuring system availability and reliability is crucial in the quickly developing field of cloud computing. The importance of fault tolerance in cloud infrastructure systems grows as organizations become more reliant on it to support their critical operations. The purpose of this article is to investigate the intricate realm of cloud computing and distributed systems. Specifically, the paper will investigate the numerous forms of cloud computing, fault tolerance methods, and frameworks that enable cloud services to be robust and durable. Cloud computing has transformed the way in which organizations and individuals access and administer computing resources. The paper discusses several deployment options, including public, private, hybrid, and multi-cloud environments, which provide organizations with the advantages of flexibility, scalability, and cost-effectiveness. The inherent flexibility of cloud computing renders it well-suited for a diverse range of applications, spanning from the hosting of websites to the execution of intricate data analytics processes. Generally, cloud computing encounters substantial obstacles, including the need of maintaining uninterrupted service in the face of hardware failures, network outages, or software errors, despite its tremendous benefits. The critical importance of fault tolerance in this particular situation cannot be overstated, as it plays a pivotal role in maintaining the dependability and availability of the system. The primary objective of this study is to examine the utilization of distributed systems as a means to augment fault tolerance within the realm of cloud computing and distributed systems. Distributed systems offer an optimal approach for addressing difficulties related to fault tolerance, owing to its intrinsic capability to divide workloads and data over several nodes. This approach utilizes redundancy, replication, and the ability to recover seamlessly from disturbances, hence enhancing the resilience and resource efficiency of cloud services. This research reviews novel techniques and frameworks that utilize distributed systems to create fault-tolerant cloud computing architectures, emphasizing their substantial influence on the cloud computing domain. In conclusion, this research report includes a comparative analysis table that encompasses twenty preceding works.

show abstract

Aggressive Fault Tolerance in Cloud Computing Using Smart Decision Agent

Cited by 5 publications

References 28 publications

Fault‐tolerance approaches for distributed and cloud computing environments: A systematic review, taxonomy and future directions

Fault‐tolerance approaches for distributed and cloud computing environments: A systematic review, taxonomy and future directions

A New Fault-Tolerant Algorithm Based on Replication and Preemptive Migration in Cloud Computing

Leveraging Distributed Systems for Fault-Tolerant Cloud Computing: A Review of Strategies and Frameworks

Contact Info

Product

Resources

About