Improving duplicate elimination in storage systems

Bobbarjung, Deepak R.; Jagannathan, Suresh; Dubnicki, Cezary

doi:10.1145/1210596.1210599

Cited by 120 publications

(52 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…These workloads have been used to evaluate other state-of-the-art deduplication protocols (e.g. [11,3]). In each workload, R consists of a snapshot of a given set of files, whereas S corresponds to a subsequent snapshot of the same files.…”

Section: Discussionmentioning

confidence: 99%

“…More recently, more intricate variants of CBH have been proposed that enhance precision and efficiency by relying on multiresolution chunking schemes [11]. Examples include TAPER [11], Hierarchical Substring Caching [10], fingerdiff [3], JumboStore [6] and Wanax [9]. Although we describe HCs in the context of single-resolution, variable-sized CBH, our algorithm can be directly applied to improve all the above solutions.…”

Section: Related Workmentioning

confidence: 99%

“…HCs can directly replace the CBH steps in any existing protocol based on CBH, including the more intricate CBH variations that have been proposed recently (e.g. [11,10,3,6]). Moreover, HCs introduce neither any relevant computation overhead nor network round-trips relatively to CBH.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Hash challenges: Stretching the limits of compare-by-hash in distributed data deduplication

Barreto

Veiga

Ferreira

2012

Information Processing Letters

View full text Add to dashboard Cite

We propose a technique for reducing communication overheads when sending data across a network. Our technique, called hash challenges, leverages existing deduplication solutions based on compare-by-hash by being able to determine redundant data chunks by exchanging substantially less meta-data. Hash challenges can be used directly on any existing compare-by-hash protocol, with no relevant additional computational complexity. Using real data from reference workloads, we show that hash challenges can save as much as 64% meta-data exchanged across the network, relatively to plain compare-by-hash. This implies reductions of up to 7% in overall transferred volume, and performance gains of up to 16% with typical asymmetrical broadband connections.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Hash challenges: Stretching the limits of compare-by-hash in distributed data deduplication

Barreto

Veiga

Ferreira

2012

Information Processing Letters

View full text Add to dashboard Cite

show abstract

“…3 shows the structure of a backup node in extreme binning de-duplication. When a file has to be backed up, it performs variable size chunking and finds the representative ChunkID and the hash value for the entire file [15], [16]. The Representative ChunkID is checked in the primary index and if it is not there, then the incoming file is new one and a new bin is created with all ChunkID, chunksize and a pointer to the actual chunks are added to the disk.…”

Section: E File Level De-duplication -Extreme Binningmentioning

confidence: 99%

Enhanced Intensive Indexing (I2D) De-Duplication for Space Optimization in Private Cloud Storage Backup

Shyamala¹,

Steven²

2015

IJCTE

View full text Add to dashboard Cite

Abstract-Cloud Storage provide users with abundant storage space and make user friendly for immediate acquiring of data, which is the foundation of all kinds of cloud applications. However, there is a lack of deep studies on how to optimize cloud storage aiming at improvement of data access performance. With the development of storage and computer technology, digital data has occupied more and more space. According to statistics, 60% of these digital data is redundant, and the traditional data compression can only eliminate the intra-file redundancy. The growth in redundant data will continue, unabated. The issue is how to manage this phenomenon, while operating with the assumption that the growth will likely accelerate. In order to solve these problems, Data De-Duplication has been proposed. Many organizations have set up private clouds for best resource utilization. An organization can built private cloud storage with their unused resources for storing their personal data. Since private cloud storage has a limited amount of hardware resources, they need to optimally utilize the space to accommodate maximum data. Data De-Duplication is an effective technique to optimize the utilization of storage space backup by avoiding the redundancy. In this paper, we are going to discuss the flaws in the existing de-duplication methods and introduce new methods for Data De-Duplication. Our proposed method namely Intensive Indexing (I2D) De-duplication which is the enhanced File level de-duplication that provides dynamic space optimization in private cloud storage backup as well as increase the throughput and de-duplication efficiency.

show abstract

“…The whole file hash value is not modified in the primary index and the updated bin is written back to the disk. Here every incoming chunk is checked only against the indices of similar files [17], this approach achieves better throughput compared to the chunk level de-duplication. Since non-traditional backup workload demands better de-duplication throughput, file level de-duplication approach is more suited in this case.…”

Section: E File Level De-duplication -Extreme Binningmentioning

confidence: 99%

Enhanced Dynamic Whole File De-Duplication (DWFD) for Space Optimization in Private Cloud Storage Backup

Devi¹,

Khanna²,

Bhalaji³

2014

IJMLC

View full text Add to dashboard Cite

Abstract-Cloud Storage provide users with abundant storage space and make user friendly for immediate data access. But there is a lack of analysis on optimizing cloud storage for effective data access. With the development of storage and technology, digital data has occupied more and more space. According to statistics, 60% of digital data is redundant, and the data compression can only eliminate intra-file redundancy. In order to solve these problems, De-Duplication has been proposed. Many organizations have set up private cloud storage with their unused resources for resource utilization. Since private cloud storage has limited amount of hardware resources, they need to optimally utilize the space to hold maximum data. In this paper, we discuss the flaws in existing methods for Data De-Duplication. Our proposed method namely Dynamic Whole File De-duplication (DWFD) provides dynamic space optimization in private cloud storage backup as well as increase the throughput and de-duplication efficiency.

show abstract

Improving duplicate elimination in storage systems

Cited by 120 publications

References 17 publications

Hash challenges: Stretching the limits of compare-by-hash in distributed data deduplication

Hash challenges: Stretching the limits of compare-by-hash in distributed data deduplication

Enhanced Intensive Indexing (I2D) De-Duplication for Space Optimization in Private Cloud Storage Backup

Enhanced Dynamic Whole File De-Duplication (DWFD) for Space Optimization in Private Cloud Storage Backup

Contact Info

Product

Resources

About