Abstract. When preparing the Data Management Plan for larger scientific endeavors, PIs have to balance between the most appropriate qualities of storage space along the line of the planned data life-cycle, its price and the available funding. Storage properties can be the media type, implicitly determining access latency and durability of stored data, the number and locality of replicas, as well as available access protocols or authentication mechanisms. Negotiations between the scientific community and the responsible infrastructures generally happen upfront, where the amount of storage space, media types, like: disk, tape and SSD and the foreseeable data life-cycles are negotiated. With the introduction of cloud management platforms, both in computing and storage, resources can be brokered to achieve the best price per unit of a given quality. However, in order to allow the platform orchestrator to programmatically negotiate the most appropriate resources, a standard vocabulary for different properties of resources and a commonly agreed protocol to communicate those, has to be available. In order to agree on a basic vocabulary for storage space properties, the storage infrastructure group in INDIGODataCloud together with INDIGO-associated and external scientific groups, created a working group under the umbrella of the Research Data Alliance (RDA). As communication protocol, to query and negotiate storage qualities, the Cloud Data Management Interface (CDMI) has been selected. Necessary extensions to CDMI are defined in regular meetings between INDIGO and the Storage Network Industry Association (SNIA). Furthermore, INDIGO is contributing to the SNIA CDMI reference implementation as the basis for interfacing the various storage systems in INDIGO to the agreed protocol and to provide an official Open-Source skeleton for systems not being maintained by INDIGO partners.
This paper presents the popular backup/archival service developed and operated in Poland by members of the PIONIER network consortium and its example application for outsourcing of the archival of the network traffic in the national academic network. The service is built upon the National Data Storage (NDS) system architecture deployed in the redundant, high-end, geographically distributed infrastructure of servers, network and data storage systems built within the confines of the PLATON project. The details of the NDS architecture and its features are discussed in the paper including the system components, their functionality and the system scalability aspects. The paper also presents how the NDS architecture is deployed in the data storage infrastructure of the PLATON project, with an extensive usage of servers and storage virtualization technologies. We discuss how the NDS system instantiation allows for flexible set up of the multiple instances of the popular backup/archival service, which can address various, often contradictory requirements of the service users, while sharing a common pool of physical resources. As an example the system set up for outsourcing the archival of the PIONIER network traffic is presented.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.