Flash-based solid-state drives (SSDs) have been widely adopted in various storage systems, manifesting better performance than their forerunner HDDs. However, the characteristics of flash media post some drawbacks when deploying SSD-based storage systems. First, flash media have limited program/erase cycles, making them vulnerable to media failures. Second, SSD foreground I/Os can suffer from inconsistent performance due to interference from background operations like garbage collection (GC). The major solution to the above problems is to introduce data redundancy. Redundant data can not only detect raw bit errors and recover lost data but also enable I/O scheduling to sidestep SSDs that are under performance degradation.
Compared with multi-replica, data coding is a more space-efficient way to provide redundancy. However, it is more challenging to simultaneously achieve low access latency, consistent performance and fast recovery. This paper examines the design of coded storage in existing storage systems, with a focus on flash storage systems, and how they address these challenges. The coded storage techniques are categorized into in-device coding, cross-device coding, and cross-machine coding. They are designed for different scenarios and purposes, but share some design rationales in common. For each type of coded storage, we begin by presenting the theoretical bases, followed by an overview of how existing studies address the performance and endurance issues of coded storage from a systemic perspective. Finally, we review the history of coded storage, list several key insights from existing works, and speculate some promising directions for flash-oriented coded storage systems.