2020
DOI: 10.1155/2020/8869237
|View full text |Cite
|
Sign up to set email alerts
|

Research on Multifeature Data Routing Strategy in Deduplication

Abstract: Deduplication is a popular data reduction technology in storage systems which has significant advantages, such as finding and eliminating duplicate data, reducing data storage capacity required, increasing resource utilization, and saving storage costs. The file features are a key factor that is used to calculate the similarity between files, but the similarity calculated by the single feature has some limitations especially for the similar files. The storage node feature reflects the load condition of the nod… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4

Relationship

2
2

Authors

Journals

citations
Cited by 4 publications
(6 citation statements)
references
References 12 publications
0
6
0
Order By: Relevance
“…Figure 4-1 shows the deduplication rate between the stateful EMC-stateful, stateless EMC-stateless, and DRMF [19] algorithms based on the EMC scheme and the data frequency-based classification routing algorithm DRDF.…”
Section: Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…Figure 4-1 shows the deduplication rate between the stateful EMC-stateful, stateless EMC-stateless, and DRMF [19] algorithms based on the EMC scheme and the data frequency-based classification routing algorithm DRDF.…”
Section: Resultsmentioning
confidence: 99%
“…This scheme can obtain a high deduplication rate under the premise of maintaining a balanced data distribution . However, similar to the broadcast system communication overhead and frequent block fingerprint query, it seriously affects the deduplication performance of the cluster [18][19].…”
Section: Iirelated Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Firstly, we test the comparison of the deduplication rate of the algorithms proposed in this paper under different storage nodes. As shown in Figure 4-1, Stateful denotes the stateful data routing policy, DRMF [19] algorithms and Σ-Dedupe [12] algorithms, Stateless denotes the stateless data routing policy proposed in this paper, and DBF denotes the global BloomFilter-based data routing policy proposed in this paper. In the experiments, five storage nodes, 8 nodes, 16 nodes, 32 nodes, 64 nodes, 128 nodes, 256 nodes, and 512 nodes, are selected to verify the number of nodes.…”
Section: Volume XX 2017mentioning
confidence: 99%
“…Both approaches have their pros and cons, but to ensure the parallelism of the system and low system overhead, the current cluster deduplication system mainly uses the former for data routing [3][4][5]. Implement the routing strategy requires maintaining fingerprint indexes of data blocks in the memory of storage nodes, which the size of data block is 4KB and the fingerprint size of each data block is about 40B.…”
Section: Introductionmentioning
confidence: 99%