The central topic of this book is application-level fault-tolerance, that is the methods, architectures, and tools that allow to express a fault-tolerant system in the application software of our computers. Application-level fault-tolerance is a sub-class of software fault-tolerance that focuses on the problems of expressing the problems and solutions of fault-tolerance in the top layer of the hierarchy of virtual machines that constitutes our computers. This book shows that application-level fault-tolerance is a key ingredient to craft truly dependable computer systems-other approaches, such as hardware fault-tolerance, operating system fault-tolerance, or fault-tolerant middleware, are also important ingredients to achieve resiliency, but they are not enough. Failing to address the application layer means leaving a backdoor open to problems such as design faults, interaction faults, or malicious attacks, whose consequences on the quality of service could be as unfortunate as, e.g., a physical fault affecting the system platform. In other words, in most cases it is simply not possible to achieve complete coverage against a given set of faults or erroneous conditions without embedding fault-tolerance provisions also in the application layer. In what follows the provisions for application-level fault-tolerance are called application-level fault-tolerance protocols. As a lecturer in this area, I wrote this book as my ideal textbook for a possible course on resilient computing and for my doctoral students in software dependability at the University of Antwerp. Despite this, the main goal of this book is not-only-education. The main mission of this book is first of all spreading the awareness of the necessity of application-level fault-tolerance. Another critical goal is highlighting the role of several important concepts that are often neglected or misunderstood: The fault and the system models, i.e., the assumptions on top of which our computer services are designed and constructed. Last but not the least of our goals, this book aims to provide a clear view to the state-of-the-art of application-level fault-tolerance, also highlighting in the process a number of lessons learned through hands-on experiences gathered in more than 10 years of work in the area of resilient computing. It is our belief that any person who wants to include dependability among the design goals of their intended software services should have a clear understanding of concepts such as dependability, system models, failure semantics, and fault models and of their influence on their final product's quality of experience. Such information is often scattered among research papers while it is presented here in a unitary EFTOS tools for exception handling, distributed voting, watchdog timers, fault-tolerant communication, atomic transactions, and data stabilization, are discussed. The reader is also given a detailed description of RAFTNET (Raftnet, n.d.), a fault-tolerance library for data parallel applications. A second large class of application...