2012
DOI: 10.1145/2078861.2078864
|View full text |Cite
|
Sign up to set email alerts
|

A study of practical deduplication

Abstract: We collected file system content data from 857 desktop computers at Microsoft over a span of 4 weeks. We analyzed the data to determine the relative efficacy of data deduplication, particularly considering whole-file versus block-level elimination of redundancy. We found that whole-file deduplication achieves about three quarters of the space savings of the most aggressive block-level deduplication for storage of live file systems, and 87% of the savings for backup images. We also studied file fragmentation fi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

3
159
0
3

Year Published

2015
2015
2022
2022

Publication Types

Select...
8
2

Relationship

0
10

Authors

Journals

citations
Cited by 355 publications
(165 citation statements)
references
References 10 publications
3
159
0
3
Order By: Relevance
“…The limitations of Rabin fingerprint based CDC [20] and MAXP [28] algorithms are superseded by AE algorithm. After chunking process, the cryptographically secure hash-based signature is applied on each chunk to compute its fingerprint [29]. This process is known as fingerprinting.…”
Section: Related Workmentioning
confidence: 99%
“…The limitations of Rabin fingerprint based CDC [20] and MAXP [28] algorithms are superseded by AE algorithm. After chunking process, the cryptographically secure hash-based signature is applied on each chunk to compute its fingerprint [29]. This process is known as fingerprinting.…”
Section: Related Workmentioning
confidence: 99%
“…Hash function will be used to find duplicate files. This algorithm is a fast method with small amount of calculation so it provides high efficiency and low cost [3].…”
Section: Whole File Hashingmentioning
confidence: 99%
“…de-duplication) [8,9]. Он представляет собой метод сжатия информации, когда поиск копий производится по всему массиву данных, а не в пределах одного файла.…”
Section: исключение дублирования данныхunclassified