As the smart devices revolutionize, people may generate a lot of data and store the data in the local or remote file system in their daily lives. Even though the novel computer hardware and network technologies can handle the demand of generating a big volume of data, effective file deduplication can save storage space in either the private computing environment or the public cloud system. In the paper, we aim at designing and implementing various file deduplication schemes on storage device, which are based on different duplication checking rules, including file name, file size, and file full/partial content hash value. Comprehensive experiment results show that a partial content hashing based file deduplication can have a better trade-off between the computation cost and deduplication accuracy.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.