Flat XOR-based erasure codes in storage systems: Constructions, efficient recovery, and tradeoffs

Greenan, Kevin M.; Li, Xiaozhou; Wylie, Jay J.

doi:10.1109/msst.2010.5496983

Cited by 72 publications

(33 citation statements)

References 45 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Hand-tuning the code with respect to multiple cores and SSE extensions will also yield significant performance gains. Other code properties, like the amount of data required for recovery, may limit performance more than the CPU overhead [11], [28]. We look forward to addressing these challenges in future work.…”

Section: Discussionmentioning

confidence: 99%

Heuristics for optimizing matrix-based erasure codes for fault-tolerant storage systems

Plank

Schuman

Robison

2012

IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012)

View full text Add to dashboard Cite

Abstract-Large scale, archival and wide-area storage systems use erasure codes to protect users from losing data due to the inevitable failures that occur. All but the most basic erasure codes employ bit-matrices so that encoding and decoding may be effected solely with the bitwise exclusive-OR (XOR) operation. There are CPU savings that can result from strategically scheduling these XOR operations so that fewer XOR's are performed. It is an open problem to derive a schedule from a bit-matrix that minimizes the number of XOR operations.We attack this open problem, deriving two new heuristics called Uber-CHRS and X-Sets to schedule encoding and decoding bit-matrices with reduced XOR operations. We evaluate these heuristics in a variety of realistic erasure coding settings and demonstrate that they are a significant improvement over previously published heuristics. We provide an open-source implementation of these heuristics so that practitioners may leverage our work.

show abstract

Section: Discussionmentioning

confidence: 99%

Heuristics for optimizing matrix-based erasure codes for fault-tolerant storage systems

Plank

Schuman

Robison

2012

IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012)

View full text Add to dashboard Cite

show abstract

“…WEAVER codes [7] are extremely efficient, but unfortunately require high storage overhead (2x and greater). The storage cost of both HoVer codes [8] and Stepped Combination codes [9] is less than 2x. Among these codes, LRC [10], [11] is known to offer the best (or optimal) trade-off between storage overhead, fault tolerance, and the number of disks involved in reconstruction.…”

Section: Related Workmentioning

confidence: 99%

Fast Reconstruction for Degraded Reads and Recovery Process in Primary Array Storage Systems

Sung

Park

2017

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

Baegjae SUNG†a) , Nonmember and Chanik PARK †b) , Member SUMMARY RAID has been widely deployed in disk array storage systems to manage both performance and reliability simultaneously. RAID conducts two performance-critical operations during disk failures known as degraded reads/writes and recovery process. Before the recovery process is complete, reads and writes are degraded because data is reconstructed using data redundancy. The performance of degraded reads/writes is critical in order to meet stipulations in customer service level agreements (SLAs), and the recovery process affects the reliability of a storage system considerably. Both operations require fast data reconstruction. Among the erasure codes for fast reconstruction, Local Reconstruction Codes (LRC) are known to offer the best (or optimal) trade-off between storage overhead, fault tolerance, and the number of disks involved in reconstruction. Originally, LRC was designed for fast reconstruction in distributed cloud storage systems, in which network traffic is a major bottleneck during reconstruction. Thus, LRC focuses on reducing the number of disks involved in data reconstruction, which reduces network traffic. However, we observe that when LRC is applied to primary array storage systems, a major bottleneck in reconstruction results from uneven disk utilization. In other words, underutilized disks can no longer receive I/O requests as a result of the bottleneck of overloaded disks. Uneven disk utilization in LRC is due to its dedicated group partitioning policy to achieve the Maximally Recoverable property. In this paper, we present Distributed Reconstruction Codes (DRC) that support fast reconstruction in primary array storage systems. DRC is designed with group shuffling policy to solve the problem of uneven disk utilization. Experiments on real-world workloads show that DRC using global parity rotation (DRC-G) improves degraded performance by as much as 72% compared to RAID-6 and by as much as 35% compared to LRC under the same reliability. In addition, our study shows that DRC-G reduces the recovery process completion time by as much as 52% compared to LRC. key words: array storage systems, RAID, erasure codes, fast reconstruction IntroductionThe data protection technique of RAID [1] has been widely deployed in primary disk array storage systems to manage both performance and reliability simultaneously. RAID recovers data when a disk failure occurs by using redundant data (e.g., erasure codes recovery process reconstructs data of the failed disk and rebuilds data onto a replacement disk. Simultaneously, RAID serves reads and writes from applications using data reconstruction (i.e., degraded reads/writes). Therefore, fast reconstruction is the main operation that improves both the performance of degraded reads/writes and the recovery process.In primary array storage systems, the performance of degraded reads/writes during disk failure is critical in order to meet stipulations in customer service level agreements (SLAs). In addition, the perfor...

show abstract

“…Non-MDS codes have been explored recently because of their reduced I/O costs and applicability to very large systems [GLW10,HCL07,Lub02]. In particular, there are several non-MDS codes that organize blocks of a stripe into a matrix and encode rows (inter-disk) and columns (intra-disk) in an orthogonal manner.…”

Section: Related Workmentioning

confidence: 99%

Sector-Disk (SD) Erasure Codes for Mixed Failure Modes in RAID Systems

Plank

Blaum

2014

ACM Trans. Storage

View full text Add to dashboard Cite

Traditionally, when storage systems employ erasure codes, they are designed to tolerate the failures of entire disks. However, the most common types of failures are latent sector failures, which only affect individual disk sectors, and block failures which arise through wear on SSD's. This paper introduces SD codes, which are designed to tolerate combinations of disk and sector failures. As such, they consume far less storage resources than traditional erasure codes. We specify the codes with enough detail for the storage practitioner to employ them, discuss their practical properties, and detail an open-source implementation.

show abstract

Flat XOR-based erasure codes in storage systems: Constructions, efficient recovery, and tradeoffs

Cited by 72 publications

References 45 publications

Heuristics for optimizing matrix-based erasure codes for fault-tolerant storage systems

Heuristics for optimizing matrix-based erasure codes for fault-tolerant storage systems

Fast Reconstruction for Degraded Reads and Recovery Process in Primary Array Storage Systems

Sector-Disk (SD) Erasure Codes for Mixed Failure Modes in RAID Systems

Contact Info

Product

Resources

About