2006
DOI: 10.1145/1210596.1210599
|View full text |Cite
|
Sign up to set email alerts
|

Improving duplicate elimination in storage systems

Abstract: Elimination of redundant data has become a critical concern in the design of storage architectures. Content addressable storage engines eliminate data at the block level by mapping data blocks with the same content to the same physical storage location. Intelligent object partitioning techniques leverage block level content addressing in order to improve duplicate elimination. In this paper, we propose a novel object partitioning technique -fingerdiff that is designed to improve storage consumption of existing… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
48
0

Year Published

2012
2012
2022
2022

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 120 publications
(52 citation statements)
references
References 17 publications
1
48
0
Order By: Relevance
“…These workloads have been used to evaluate other state-of-the-art deduplication protocols (e.g. [11,3]). In each workload, R consists of a snapshot of a given set of files, whereas S corresponds to a subsequent snapshot of the same files.…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…These workloads have been used to evaluate other state-of-the-art deduplication protocols (e.g. [11,3]). In each workload, R consists of a snapshot of a given set of files, whereas S corresponds to a subsequent snapshot of the same files.…”
Section: Discussionmentioning
confidence: 99%
“…More recently, more intricate variants of CBH have been proposed that enhance precision and efficiency by relying on multiresolution chunking schemes [11]. Examples include TAPER [11], Hierarchical Substring Caching [10], fingerdiff [3], JumboStore [6] and Wanax [9]. Although we describe HCs in the context of single-resolution, variable-sized CBH, our algorithm can be directly applied to improve all the above solutions.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…3 shows the structure of a backup node in extreme binning de-duplication. When a file has to be backed up, it performs variable size chunking and finds the representative ChunkID and the hash value for the entire file [15], [16]. The Representative ChunkID is checked in the primary index and if it is not there, then the incoming file is new one and a new bin is created with all ChunkID, chunksize and a pointer to the actual chunks are added to the disk.…”
Section: E File Level De-duplication -Extreme Binningmentioning
confidence: 99%
“…The whole file hash value is not modified in the primary index and the updated bin is written back to the disk. Here every incoming chunk is checked only against the indices of similar files [17], this approach achieves better throughput compared to the chunk level de-duplication. Since non-traditional backup workload demands better de-duplication throughput, file level de-duplication approach is more suited in this case.…”
Section: E File Level De-duplication -Extreme Binningmentioning
confidence: 99%