Cloud computing is widely popular due to its elasticity, economics, reliability and much more. Cloud computing offers a scalable service without any initial investment in servers, storages, or networks. Fault Tolerance (FT) is the ability of any system to continue performing its function regardless of any unexpected hardware or software failures. Fault Tolerance in Cloud Computing (FTCC) is an important area of research due to its complexity. However, there is a lack of studies in this field. Moreover, recent failures and availability issues in popular cloud providers demonstrates the need for more effective solutions. In this paper, we present a study on FTCC mechanisms and analyze its strength and weakness. Based on the study, a comparison on the main fault tolerance techniques is presented considering the cost, overhead, failure types, performance, and the tools used. Moreover, we study and compare the models that enhance the performance of checkpoint and replication based techniques.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.