2019
DOI: 10.1016/j.future.2019.02.008
|View full text |Cite
|
Sign up to set email alerts
|

Accelerating content-defined-chunking based data deduplication by exploiting parallelism

Abstract: Data deduplication, a data reduction technique that efficiently detects and eliminates redundant data chunks and files, has been widely applied in large-scale storage systems. Most existing deduplication-based storage systems employ content-defined chunking (CDC) and secure-hash-based fingerprinting (e.g., SHA1) to remove redundant data at the chunk level (e.g., 4KB/8KB chunks), which are extremely compute-intensive and thus time-consuming for storage systems. Therefore, we present P-Dedupe, a pipelined and pa… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
13
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 34 publications
(13 citation statements)
references
References 24 publications
0
13
0
Order By: Relevance
“…Xia et al, [7] utilized 'P-Dedupe' with parallelized data deduplication method, which speed up deduplication method through dividing the deduplication method into 4 phases such as chunking, fingerprinting, indexing, and writing. Pipelining these 4 phases with chunks and files then parallelizing 'CDC' (Content Defined Chunking) also secure hash based fingerprinting phases to more ease the calculation block.…”
Section: Related Workmentioning
confidence: 99%
“…Xia et al, [7] utilized 'P-Dedupe' with parallelized data deduplication method, which speed up deduplication method through dividing the deduplication method into 4 phases such as chunking, fingerprinting, indexing, and writing. Pipelining these 4 phases with chunks and files then parallelizing 'CDC' (Content Defined Chunking) also secure hash based fingerprinting phases to more ease the calculation block.…”
Section: Related Workmentioning
confidence: 99%
“…Improves the detection mechanism at decryption end to exactly identify the redundant files Decrease in network lifetime Xia et. al [11] Pipelined and Parallelized data de-duplication scheme…”
Section: Zheng Et Al [10]mentioning
confidence: 99%
“…De-duplication is a storage optimization technique which evades accumulating identical replicas of data [9]. The main task to be performed in de-duplication is partitioning of data [11], [18], [22]. As deduplication is carried out with protective data various security measures are considered [10], [12][13][14][15], [19].…”
Section: Jiang Et Al [18]mentioning
confidence: 99%
“…According to whether the length of the chunks is fixed, the chunking algorithm can be classified into fixed-length chunking algorithm and content-defined chunking algorithm(CDC) [3]. Fixed-length chunking, as the name implies, is to divide the file into chunks with fixed length.…”
Section: A Chunking Algorithms Classificationmentioning
confidence: 99%