Research on Multifeature Data Routing Strategy in Deduplication

He, Qinlu; BianGenqing,; ShaoBilin,; ZhangWeiqi,

doi:10.1155/2020/8869237

Cited by 4 publications

(6 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Figure 4-1 shows the deduplication rate between the stateful EMC-stateful, stateless EMC-stateless, and DRMF [19] algorithms based on the EMC scheme and the data frequency-based classification routing algorithm DRDF.…”

Section: Resultsmentioning

confidence: 99%

“…This scheme can obtain a high deduplication rate under the premise of maintaining a balanced data distribution . However, similar to the broadcast system communication overhead and frequent block fingerprint query, it seriously affects the deduplication performance of the cluster [18][19].…”

Section: Iirelated Workmentioning

confidence: 99%

“…Figure 4-2 shows the logic time comparison of the DRDF and DRMF [19] and EMC stateful and stateless routing algorithms for processing about 11 million fingerprint data. EM C-stateful has to query storage nodes every time routing data, so it needs more time overhead.…”

Section: Figure 4-1 Comparison Of Algorithm Deduplication Ratementioning

confidence: 99%

See 2 more Smart Citations

Research on Routing Strategy in Cluster Deduplication System

Bian

Zhang

et al. 2021

IEEE Access

Self Cite

View full text Add to dashboard Cite

A cluster deduplication system can coordinate the work of mu ltip le nodes, wh ich can better alleviate the disk index bottleneck existing in the large-scale data backup system. However, there is a problem of isolated islands of information among nodes during data deduplication. When the servers use the query mode to route data, a large amount of system overhead is required to ensure a high deduplication rate and low throughput rate. At the same time, wh ile the servers cannot obtain a h igher deduplication rate if the servers adopt the stateless routing method. Data routing strategy can greatly affect the overall performance of the system. The concept of data frequency is proposed in this paper, and the classified routing strategy is designed. In the metadata server, a byte-shaped Bloom filter for record ing the occurrence frequency of data blocks is maintained to record the occurrence frequency of data blocks. The values in the Bloom filter are counted. Then the frequency of the data blocks is co mpared with the configured threshold value to determine whether the data is "hot data". We use stateful routing to send "clod data" to the storage nodes and use stateless routing to send the hot data to the storage nodes. Experimental results show that the classifying routing algorithm based on the frequency of data can greatly reduce the overhead of the system while guaranteeing the deduplication rate of the deduplication system as well as imp rove system throughput and real-time processing capabilities. Co mpared with the fully stateful routing scheme, our method only loses less than 2% of the deduplication rate, which reduces the communication query overhead by more than 25% and improves the real-time processing capability of the storage system.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Iirelated Workmentioning

confidence: 99%

See 1 more Smart Citation

Research on Routing Strategy in Cluster Deduplication System

Bian

Zhang

et al. 2021

IEEE Access

Self Cite

View full text Add to dashboard Cite

show abstract

“…Firstly, we test the comparison of the deduplication rate of the algorithms proposed in this paper under different storage nodes. As shown in Figure 4-1, Stateful denotes the stateful data routing policy, DRMF [19] algorithms and Σ-Dedupe [12] algorithms, Stateless denotes the stateless data routing policy proposed in this paper, and DBF denotes the global BloomFilter-based data routing policy proposed in this paper. In the experiments, five storage nodes, 8 nodes, 16 nodes, 32 nodes, 64 nodes, 128 nodes, 256 nodes, and 512 nodes, are selected to verify the number of nodes.…”

Section: Volume XX 2017mentioning

confidence: 99%

“…Both approaches have their pros and cons, but to ensure the parallelism of the system and low system overhead, the current cluster deduplication system mainly uses the former for data routing [3][4][5]. Implement the routing strategy requires maintaining fingerprint indexes of data blocks in the memory of storage nodes, which the size of data block is 4KB and the fingerprint size of each data block is about 40B.…”

Section: Introductionmentioning

confidence: 99%

Research on Data Routing Strategy of Deduplication in Cloud Environment

Zhang

Bian

et al. 2022

IEEE Access

Self Cite

View full text Add to dashboard Cite

The application of data deduplication technology reduces the demand for data storage and improves resource utilization. Compared with limited storage capacity and computing capacity of a single node, cluster data deduplication technology has great advantages. However, the cluster data duplication technology also brings new issues on deduplication rate reduction and load balancing of storage nodes. The application of data routing strategy can well balance the problem of deduplication rate and load balancing. Therefore, this paper proposes a data routing strategy based on distributed Bloom Filter. 1)Superchunk is used as the basic unit of data routing to improve system throughput. According to Broder's theorem, k leastsized fingerprints are selected as the Superchunk features and send to the storage node. The optimal node is selected as the routing node by matching the BloomFilter, and the storage capacity of the node and maintained in the memory of the storage node. 2) Design and implement system prototypes. The specific parameters of all kinds of routing strategies are obtained through experiments, and the routing strategies proposed in this paper are tested. The theoretical analysis and experimental results prove the feasibility of the strategies proposed by this paper. Compared with the other routing strategies, our method improved 3% of the deduplication rate, reduces the communication query overhead by more than 36% and improves the load balancing degree of the storage system.

show abstract

Research on Global BloomFilter-Based Data Routing Strategy of Deduplication in Cloud Environment

Chen

et al. 2023

IETE Journal of Research

View full text Add to dashboard Cite

Research on Multifeature Data Routing Strategy in Deduplication

Cited by 4 publications

References 12 publications

Research on Routing Strategy in Cluster Deduplication System

Research on Routing Strategy in Cluster Deduplication System

Research on Data Routing Strategy of Deduplication in Cloud Environment

Research on Global BloomFilter-Based Data Routing Strategy of Deduplication in Cloud Environment

Contact Info

Product

Resources

About