An Efficient Data Location Protocol for Self.organizing Storage Clusters

Tang, Hong; Yang, Tao

doi:10.1145/1048935.1050203

Cited by 21 publications

(18 citation statements)

References 48 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…tolerance [1,4,6,7,9,10,12,17,24,25]. These systems do not consider the impact of the likely dual-role characteristics of cluster nodes, serving both as a compute node and also as a data server.…”

mentioning

confidence: 99%

Exploiting redundancy to boost performance in a RAID-10 style cluster-based file system

et al. 2006

View full text Add to dashboard Cite

While aggregating the throughput of existing disks on cluster nodes is a cost-effective approach to alleviate the I/O bottleneck in cluster computing, this approach suffers from potential performance degradations due to contentions for shared resources on the same node between storage data processing and user task computation. This paper proposes to judiciously utilize the storage redundancy in the form of mirroring existed in a RAID-10 style file system to alleviate this performance degradation. More specifically, a heuristic scheduling algorithm is developed, motivated from the observations of a simple cluster configuration, to spatially schedule write operations on the nodes with less load among each mirroring pair. The duplication of modified data to the mirroring nodes is performed asynchronously in the background. The read performance is improved by two techniques: doubling the degree of parallelism and hot-spot skipping. A synthetic benchmark is used to evaluate these algorithms in a real cluster environment and the proposed algorithms are shown to be very effective in performance enhancement.

show abstract

mentioning

confidence: 99%

Exploiting redundancy to boost performance in a RAID-10 style cluster-based file system

et al. 2006

View full text Add to dashboard Cite

show abstract

“…SCAD-DAR uses remapping functions similar in flavor to those in RUSH, but does not support replication beyond simple offset-based replication as discussed in Section 3.3. Consistent hashing [11] has many of the qualities of RUSH, but has a high migration overhead and is less well-suited to read-write file systems; Tang and Yang [18] use consistent hashing as part of a scheme to distribute data in large-scale storage clusters.…”

Section: Related Workmentioning

confidence: 99%

Replication under scalable hashing:a family of algorithms for scalable decentralized data distribution

Honicky

Miller

18th International Parallel and Distributed Processing Symposium, 2004. Proceedings.

View full text Add to dashboard Cite

show abstract

“…In these experiments, we only added new objects without deleting any existing items so that δ 0 is kept zero. The experiments presented in Table 2 considers both the deletion and addition of objects on each host when the initial state of BF on each host is optimized, this is, the number of hash functions is the optimal under the ratio between m and the initial number of objects n. This specific setting aims to emulate the real application where m/n and k are usually optimally or sub-optimally matched by dynamically adjusting the BF length m [3] or designing the BF length according to the average number of objects [12,6,2,4,5]. All the analytical results have been very closely matched by their real (experimental) counterparts consistently, strongly validating our theoretical models.…”

Section: Validation Of the Theoretic Models Via Experimentsmentioning

confidence: 99%

“…Ref. [3,4,5,6] use BFs to implement the function of mapping logical data identities to their physical locations in distributed storage systems. In such schemes, each storage node constructs a Bloom filter that summarizes the identities of data stored locally and broadcasts it to other nodes.…”

Section: Introductionmentioning

confidence: 99%

False Rate Analysis of Bloom Filter Replicas in Distributed Systems

Zhu

Jiang

2006 International Conference on Parallel Processing (ICPP'06)

View full text Add to dashboard Cite

An Efficient Data Location Protocol for Self.organizing Storage Clusters

Cited by 21 publications

References 48 publications

Exploiting redundancy to boost performance in a RAID-10 style cluster-based file system

Exploiting redundancy to boost performance in a RAID-10 style cluster-based file system

Replication under scalable hashing:a family of algorithms for scalable decentralized data distribution

False Rate Analysis of Bloom Filter Replicas in Distributed Systems

Contact Info

Product

Resources

About