2016
DOI: 10.3390/sym8070069
|View full text |Cite
|
Sign up to set email alerts
|

A Logistic Based Mathematical Model to Optimize Duplicate Elimination Ratio in Content Defined Chunking Based Big Data Storage System

Abstract: Abstract:Deduplication is an efficient data reduction technique, and it is used to mitigate the problem of huge data volume in big data storage systems. Content defined chunking (CDC) is the most widely used algorithm in deduplication systems. The expected chunk size is an important parameter of CDC, and it influences the duplicate elimination ratio (DER) significantly. We collected two realistic datasets to perform an experiment. The experimental results showed that the current approach of setting the expecte… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(2 citation statements)
references
References 8 publications
0
2
0
Order By: Relevance
“…The previously suggested 4 KB or 8 KB chunk size did not offer the best optimization for the deduplication ratio, so Wang et al 38 presented a mathematical model based on logistics to improve the deduplication ratio based on the expected chunk size. They used two real‐world datasets to verify that the model was accurate, and the results showed that the R2 value was higher than 0.9.…”
Section: Related Workmentioning
confidence: 99%
“…The previously suggested 4 KB or 8 KB chunk size did not offer the best optimization for the deduplication ratio, so Wang et al 38 presented a mathematical model based on logistics to improve the deduplication ratio based on the expected chunk size. They used two real‐world datasets to verify that the model was accurate, and the results showed that the R2 value was higher than 0.9.…”
Section: Related Workmentioning
confidence: 99%
“…During the implementation of CDC, the vital parameter, called expected chunk size, will definitely and significantly affect the duplicate elimination ratio (DER). For an improvement, Wang et al [21] uncovered the hidden relationship between DER and expected chunk size through designing a logistic based mathematical model to provide a theoretical basis for this kind of method. Their experimental results showed that the logistic mathematical way is correct and reasonable for choosing the expected chunk size and reaching the goal of optimizing DER.…”
Section: Related Workmentioning
confidence: 99%