Accelerating content-defined-chunking based data deduplication by exploiting parallelism

Xia, Wen; Feng, Dan; Jiang, Hong; Zhang, Yucheng; Chang, Victor; Zou, Xiangyu

doi:10.1016/j.future.2019.02.008

Cited by 34 publications

(13 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Xia et al, [7] utilized 'P-Dedupe' with parallelized data deduplication method, which speed up deduplication method through dividing the deduplication method into 4 phases such as chunking, fingerprinting, indexing, and writing. Pipelining these 4 phases with chunks and files then parallelizing 'CDC' (Content Defined Chunking) also secure hash based fingerprinting phases to more ease the calculation block.…”

Section: Related Workmentioning

confidence: 99%

Secure Data Deduplication System with Efficient and Reliable Multi-Key Management in Cloud Storage

Vignesh¹,

Vignesh²

2022

Journal of Internet Technology

View full text Add to dashboard Cite

<p>The revolutionary growth in the processing and storage mechanisms over the Internet has given the enhancement to inexpensive and strong computing properties. Cloud computing is a rising technology, which offers the data storage facility also application accessing facility in online environment. This system stands countless opportunities also challenges. In that, security of data and the increasing similar data in cloud (duplication) are very important issues to be addressed. So, Deduplication method is developed to reduce the similar data that is present in the storage system. In this paper, a novel technique is proposed to remove the duplicate data from cloud also help to save the bandwidth access and storage space. The experimental results demonstrate that the proposed system provide the more security for data in cloud storage and also overcomes the main drawbacks of the existing systems. In one-server storage and distributed storage systems, we have created a solution which provides data security and space efficacy. The chunk data generates encryption keys consistently; the same chunk is therefore always encrypted with the same chip text. In addition, the keys cannot be derived from the chunk data encrypted. Because the information to be accessed and decrypted by each user is encrypted by using a key known to the user alone, even a complete system breach cannot expose which chunks are utilised by which users.</p> <p> </p>

show abstract

Section: Related Workmentioning

confidence: 99%

Secure Data Deduplication System with Efficient and Reliable Multi-Key Management in Cloud Storage

Vignesh¹,

Vignesh²

2022

Journal of Internet Technology

View full text Add to dashboard Cite

show abstract

“…Improves the detection mechanism at decryption end to exactly identify the redundant files Decrease in network lifetime Xia et. al [11] Pipelined and Parallelized data de-duplication scheme…”

Section: Zheng Et Al [10]mentioning

confidence: 99%

“…De-duplication is a storage optimization technique which evades accumulating identical replicas of data [9]. The main task to be performed in de-duplication is partitioning of data [11], [18], [22]. As deduplication is carried out with protective data various security measures are considered [10], [12][13][14][15], [19].…”

Section: Jiang Et Al [18]mentioning

confidence: 99%

Computing Time Series Data During Index Based De-Duplication of Industrial IoT Data in Cloud Environment

Muthunagai¹,

Anitha²

2021

Preprint

View full text Add to dashboard Cite

As a result of the development in Industry 4.0, the data generated within the Industries are increasing rapidly every day to attain the innovative environment within the industry through maximal asset utilization. Meanwhile, the redundancy rate in the server is also increasing, which has an impact on the storage as well as in the analysis of data. Most existing de-duplication techniques partition the data with respect to memory. However if the time period is considered for partition, time-series analysis would be achieved during the de-duplication process. To address the above issue, the proposed work presents the Index Based De-duplication technique with Categorized Region Method for computing time-series data. The Merkle Tree with a super feature called reckoning of occurrence is combined in the proposed system to rapidly identify the existence of similar data in the distributed system with an accurate existence count that significantly helps in predicting the future drifts of the industrial environment. Finally, the proposed system also concludes with optimal transportation cost to reach the storage nodes in the cloud using MODI method. The experimental results reveal that the proposed model is efficient since it facilitates less memory and less computation overhead. The proposed technique achieves space reduction by 98%, reduces the computation overhead during analysis by 55%, and increases the efficacy of cloud storage by 60%.

show abstract

“…According to whether the length of the chunks is fixed, the chunking algorithm can be classified into fixed-length chunking algorithm and content-defined chunking algorithm(CDC) [3]. Fixed-length chunking, as the name implies, is to divide the file into chunks with fixed length.…”

Section: A Chunking Algorithms Classificationmentioning

confidence: 99%

MII: A Novel Content Defined Chunking Algorithm for Finding Incremental Data in Data Synchronization

Zhang

Cai

et al. 2019

IEEE Access

View full text Add to dashboard Cite

In the data backup system, to reduce the bandwidth and processing time overhead caused by full backup technology during data synchronization between backups and source data, incremental backup technology is emerging as the focus of academic and industrial research. It is key but poorly-solved to find the incremental data between backups and source data for incremental backup technology. To find out the incremental data during the backup process, here, in this paper, we propose a novel content-defined chunking algorithm. The source data and backup data are chunked into some small chunks in the same way with the variable length. Then, by comparing whether a chunk of source data is different from any of the chunks in backup data, we can evaluate whether the chunk of source data is incremental data. By experiments, the chunking algorithm in this paper is compared to other ones which are the classical or state-of-the-art algorithms. The experimental results show that the incremental data found by this algorithm can be reduced by 13%-34% compared to the others with the same chunk throughput. INDEX TERMS Data synchronization, chunking algorithm, data backup, increment.

show abstract

Accelerating content-defined-chunking based data deduplication by exploiting parallelism

Cited by 34 publications

References 24 publications

Secure Data Deduplication System with Efficient and Reliable Multi-Key Management in Cloud Storage

Secure Data Deduplication System with Efficient and Reliable Multi-Key Management in Cloud Storage

Computing Time Series Data During Index Based De-Duplication of Industrial IoT Data in Cloud Environment

MII: A Novel Content Defined Chunking Algorithm for Finding Incremental Data in Data Synchronization

Contact Info

Product

Resources

About