An Overview of Coding for Distributed Storage Systems

Liu, Shiqiu; Oggier, Frédérique

doi:10.1007/978-3-319-70293-3_14

Cited by 20 publications

(21 citation statements)

References 41 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…See e.g. [18], [19] for surveys on storage specific coding techniques and bounds. Contemporaneously, mechanisms to update parities efficiently were studied [20]- [23].…”

Section: Related Work and Backgroundmentioning

confidence: 99%

QLOC: Quorums With Local Reconstruction Codes

2021

View full text Add to dashboard Cite

In this paper we study the problem of consistency in distributed storage systems relying on erasure coding for storage efficient fault-tolerance. We propose QLOC -a flexible framework for supporting the storage of warm data, i.e., data which, while not being very frequently in use, nevertheless continues to be accessed for reads or writes regularly. QLOC builds upon (1) a generic family of local reconstruction codes with guarantees in terms of fault-tolerance, efficient recovery from failures and degraded mode operations, and can be instantiated with parameters customized to requirements such as storage overhead and reliability dictated by user needs and operational environments, and (2) quorum-based consistency mechanisms with support for read-modify-write operations without any underlying atomic primitives, providing deployment choices trading-off fault-tolerance, consistency and concurrency requirements. We carry out a theoretical analysis of the code properties, and experimentally benchmark the performance of the consistency enforcement mechanisms, demonstrating the practicality of the proposed approach.

show abstract

“…See e.g. [18], [19] for surveys on storage specific coding techniques and bounds. Contemporaneously, mechanisms to update parities efficiently were studied [20]- [23].…”

Section: Related Work and Backgroundmentioning

confidence: 99%

QLOC: Quorums With Local Reconstruction Codes

2021

View full text Add to dashboard Cite

show abstract

“…LR codes: In [179], the authors compare performance-evaluation results of an (n = 16, k = 12, r = 6) LR code with that of an [n = 16, k = 12] RS code in the Azure production cluster and demonstrate the repair savings offered by the LR code. Subsequently, the authors implemented an (n = 18, k = 14, r = 7) LR code in Windows Azure Storage and showed that this code has repair degree comparable to that of an [9,6] RS code, but has storage overhead 1.29 versus 1.5 in the case of the RS code. This code has reportedly resulted in the savings of millions of dollars for Microsoft [180].…”

Section: Codes In Practicementioning

confidence: 99%

Erasure coding for distributed storage: an overview

Balaji

Krishnan

Vajha

et al. 2018

Sci. China Inf. Sci.

131

102

View full text Add to dashboard Cite

In a distributed storage system, code symbols are dispersed across space in nodes or storage units as opposed to time. In settings such as that of a large data center, an important consideration is the efficient repair of a failed node. Efficient repair calls for erasure codes that in the face of node failure, are efficient in terms of minimizing the amount of repair data transferred over the network, the amount of data accessed at a helper node as well as the number of helper nodes contacted. Coding theory has evolved to handle these challenges by introducing two new classes of erasure codes, namely regenerating codes and locally recoverable codes as well as by coming up with novel ways to repair the ubiquitous Reed-Solomon code. This survey provides an overview of the efforts in this direction that have taken place over the past decade. I. INTRODUCTIONThis survey article deals with the use of erasure coding for the reliable and efficient storage of large amounts of data in settings such as that of a data center. The amount of data stored in a single data center can run into tens or hundreds of petabytes. Reliability of data storage is ensured in part by introducing redundancy in some form, ranging from simple replication to the use of more sophisticated erasure-coding schemes such as Reed-Solomon codes. Minimizing the storage overhead that comes with ensuring reliability is a key consideration in the choice of erasure-coding scheme. More recently a second problem has surfaced, namely, that of node repair.In [1], [2] the authors study the Facebook warehouse cluster and analyze the frequency of node failures as well as the resultant network traffic relating to node repair. It was observed in [1] that a median of 50 nodes are unavailable per day and that a median of 180TB of cross-rack traffic is generated as a result of node unavailability. It was also reported that 98.08% of the cases have exactly one block missing in a stripe. The erasure code that was deployed in this instance was an [n = 14, k = 10] Reed Solomon (RS) code. Here n denotes the block length of the code and k the dimension. The conventional repair of an [n, k] RS code is inefficient in that the repair of a single node, calls for contacting k other (helper) nodes and downloading k times the amount of data stored in the failed node, which is clearly inefficient. Thus there is significant practical interest in the design of erasure-coding techniques that offer both low overhead and which can also be repaired efficiently.Coding theorists have responded to this need by coming up with two new classes of codes, namely ReGenerating (RG) and Locally Recoverable (LR) codes. The focus in a RG code is on minimizing the amount of data download needed to repair a failed node, termed the repair bandwidth while LR codes seek to minimize the number of helper nodes contacted for node repair, termed the repair degree. In a different direction, coding theorists have also re-examined the problem of node repair in RS codes and have come up with new and more efficient ...

show abstract

“…Compared to q-ary LRCs, BLRCs are known to be advantageous in terms of implementation in practical systems. In [43], the advantages of (n, k, d, r) = (15,10,4,6) BLRC are discussed and compared with (16,10,4,5) non-binary LRC, (14,10) RS code, and three-replication with four metrics including encoding complexity, repair complexity, mean time to data loss, and storage capacity. The authors of [43] further analyzed the advantages of BLRCs with a high Hamming distance and average locality [44,45].…”

Section: Binary Locally Repairable Codesmentioning

confidence: 99%

“…In some respects, LRC is essentially a block code with an additional parameter referred to as locality. There have been excellent reviews on the distributed storage codes (e.g., [13][14][15][16]). Moreover, a review article on this topic has recently been published [17].…”

Section: Introductionmentioning

confidence: 99%

Overview of Binary Locally Repairable Codes for Distributed Storage Systems

Kim

2019

Electronics

View full text Add to dashboard Cite

This paper summarizes the details of recently proposed binary locally repairable codes (BLRCs) and their features. The construction of codes over a small alphabet size of symbols is of particular interest for efficient hardware implementation. Therefore, BLRCs are highly noteworthy because no multiplication is required during the encoding, decoding, and repair processes. We explain the various construction approaches of BLRCs such as cyclic code based, bipartite graph based, anticode based, partial spread based, and generalized Hamming code based techniques. We also describe code generation methods based on modifications for linear codes such as extending, shorting, expurgating, and augmenting. Finally, we summarize and compare the parameters of the discussed constructions.

show abstract

An Overview of Coding for Distributed Storage Systems

Cited by 20 publications

References 41 publications

QLOC: Quorums With Local Reconstruction Codes

QLOC: Quorums With Local Reconstruction Codes

Erasure coding for distributed storage: an overview

Overview of Binary Locally Repairable Codes for Distributed Storage Systems

Contact Info

Product

Resources

About