Deduplication has been largely employed in distributed storage systems to improve space efficiency. Traditional deduplication research ignores the design specifications of shared-nothing distributed storage systems such as no central metadata bottleneck, scalability, and storage rebalancing. Further, deduplication introduces transactional changes, which are prone to errors in the event of a system failure, resulting in inconsistencies in data and deduplication metadata. In this paper, we propose a robust, fault-tolerant and scalable clusterwide deduplication that can eliminate duplicate copies across the cluster. We design a distributed deduplication metadata shard which guarantees performance scalability while preserving the design constraints of sharednothing storage systems. The placement of chunks and deduplication metadata is made cluster-wide based on the content fingerprint of chunks. To ensure transactional consistency and garbage identification, we employ a flagbased asynchronous consistency mechanism. We implement the proposed deduplication on Ceph. The evaluation shows high disk-space savings with minimal performance degradation as well as high robustness in the event of sudden server failure. * Mr. Prince is currently affiliated with Ajou University, Suwon, Republic of Korea.
NOVA, a state-of-the-art non-volatile memory (NVM) le system, has limited performance due to its coarse-grained per-le lock when multiple threads perform I/Os to a shared le in a manycore environment. For instance, a writer lock blocks other threads attempting to access the same le, although they access di erent regions of a le. When multiple threads reading the same le share a cache line containing a reader counter, performance can be signi cantly degraded due to cache consistency protocol as we increase the number of readers. This paper proposes a ne-grained segment-based range lock (SRL) that divides a le into multiple segments and manages a lock variable dynamically for each segment. Consequently, write operations can be parallelized without blocking unless there is a con ict in accessing the same range in a le. Moreover, SRL maintains a reader counter per segment that allows multiple reader threads to perform read operations without causing a performance bottleneck. We evaluated an SRL-based NOVA on an Intel Optane DC persistent memory (PM) manycore server. The benchmarking results showed that the average write throughput of the SRL-based NOVA is 3× higher than the original NOVA, and the average read throughput scales linearly, while the original NOVA does not scale.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.