Erasure coding is a known redundancy technique that has been popularly deployed in modern storage systems to protect against failures. By introducing a small portion of coded redundancy into data storage, erasure coding is shown to provide higher reliability guarantees than replication under the same storage overhead. Despite its storage efficiency, erasure coding incurs high performance overhead in repair and updates, and its reliability also depends on the amount of redundancy. How to resolve the tensions among storage efficiency, performance, and reliability has been the major research direction in the literature for decades.
In this paper, we present an in-depth survey of the past, present, and future of erasure coding in storage systems. We conduct our survey from a systems perspective, with an emphasis on how erasure coding is deployed in practical storage systems. Specifically, we first review the use of erasure coding in storage systems from both academia and industry, and state the challenges of deploying erasure coding in practice. We then review the topics of erasure coding in three aspects: (i) new erasure code constructions, (ii) algorithmic techniques for efficient erasure coding operations, and (iii) erasure coding for emerging architectures. Finally, we provide future research directions for erasure coding.