Exponential growth of the amount of data stored worldwide together with high level of data redundancy motivates the active development of data deduplication techniques. The overall increasing popularity of solid-state drives (SSDs) as primary storage devices forces the adaptation of deduplication techniques to technical peculiarities of this type of storage (such as write amplification and wearout), implying active research in SSD-equipped storage data deduplication subdomain. In this survey paper the authors summarize the recent results on deduplication in SSD-enhanced storage, providing a novel taxonomy of the techniques. They classify the techniques on the basis of storage device complexity, starting from a sub-device level up to the storage network. Linux deduplication implementations are discussed, and the results of experimental comparison of several widely used tools are presented. Finally, the authors briefly outline open problems in the field and possible points of future research.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.