We present DEBAR, a scalable and high-performance de-duplication storage system for backup and archiving, to overcome the throughput and scalability limitations of the state-of-the-art data de-duplication schemes, including the Data Domain De-duplication File System (DDFS). DEBAR uses a two-phase de-duplication scheme (TPDS) that exploits memory cache and disk index properties to judiciously turn the notoriously random and small disk I/Os of fingerprint lookups and updates into large sequential disk I/Os, hence achieving a very high de-duplication throughput. The salient feature of this approach is that both the system backup and archiving capacity and the de-duplication performance can be dynamically and cost-effectively scaled up on demand; it hence not only significantly improves the throughput of a single de-duplication server but also is conducive to distributed implementation and thus applicable to largescale and distributed storage systems.
This paper presents the implementation and performance evaluation of a real, secure object-based storage system compliant to the T10 OSD standard. In contrast to previous work, our system implements the entire three security methods of the OSD security protocol defined in the standard, namely CAPKEY, CMDRSP and ALLDATA, and an Oakley-based authentication protocol by which the Metadata Server (MDS) and client can be sure of each other's identities. Moreover, our system supports concurrent operations from multiple clients to multiple OSDs. The MDS, a combination of security manager and storage/policy manager, performs access control, global namespace management, and concurrency control.We also evaluate the performance and scalability of our implementation and compare it with iSCSI, NFS and Lustre storage configurations. The overhead of access control is small: compared with the same system without any security mechanism, bandwidth for reads and writes with the CAPKEY and CMDRSP methods decreases by less than 5%, while latency for metadata operations with any of the security methods increases by less than 0.3 ms (5%). The system with the ALLDATA method suffers a higher performance penalty: large sequential accesses run at 46% and 52% of the maximum bandwidth of unsecured storage for reads and writes respectively. The aggregate throughput scales with the number of OSDs (up to 8 in our experiments). The overhead of the SET KEY commands for partition and working keys refreshed frequently is less than 2 ms.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.