Large scientific data centers have recently begun providing a number of different types of data storage in order to satisfy the various needs of their users. Users with interactive accounts, for example, might want a POSIX interface for easy access to the data from their interactive machines. Grid computing sites, on the other hand, likely need to provide an X509-based storage protocol, like SRM and GridFTP, since the data management system is built upon them. Meanwhile, an experiment producing large amounts of data typically demands a service that provides archival storage for the safe keeping of their unique data. To access these various types of data, users must use specific sets of commands tailored to their respective storage, making access to their data complex and difficult. BNLBox is an attempt to provide a unified and easy to use storage service for all BNL users, to store their important documents, code and data. It is a cloud storage system with an intuitive web interface for novice users. It provides an automated synchronization feature that enables users to upload data to their cloud storage without manual intervention, freeing them to focus on analysis rather than data management software. It provides a POSIX interface for local interactive users, which simplifies data access from batch jobs as well. At the same time, it also provides users with a straightforward mechanism for archiving large data sets for later processing. The storage space can be used for both code and data within the compute job environment. This paper will describe various aspects of the BNLBox storage service.
Tape is an excellent choice for archival storage because of the capacity, cost per GB and long retention intervals, but its main drawback is the slow access time due to the nature of sequential medium. Modern enterprise tape drives now support Recommended Access Ordering (RAO), which is designed to reduce data recall/retrieval times. BNL SDCC's mass storage system currently holds more than 100 PB of data on tapes, managed by HPSS. Starting with HPSS version 7.5.1, a new feature called "Tape Order Recall (TOR) has been introduced. It supports both RAO and non-RAO drives. The file access performance can be increased by 30% to 60% over the random file access. Prior to HPSS 7.5.1, we have been using an in-house developed scheduling software, aka ERADAT. ERADAT accesses files based on the file logical position order. It has demonstrated a great performance over the past decade long usage in BNL. In this paper we will present a series of test results, compare TOR and ERADAT's performance under different configurations to show how effective TOR (RAO) and ERADAT perform and what is the best solution in data recall from SDCC's tape storage.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.