The CERN IT Storage group operates multiple distributed storage systems to support all CERN data storage requirements: the physics data generated by LHC and non-LHC experiments; object and file storage for infrastructure services; block storage for the CERN cloud system; filesystems for general use and specialized HPC clusters; content distribution filesystem for software distribution and condition databases; and sync&share cloud storage for end-user files. The total integrated capacity of these systems exceeds 0.6 Exabyte. Large-scale experiment data taking has been supported by EOS and CASTOR for the last 10+ years. Particular highlights for 2018 include the special HeavyIon run which was the last part of the LHC Run2 Programme: the IT storage systems sustained over 10GB/s to flawlessly collect and archive more than 13 PB of data in a single month. While the tape archival continues to be handled by CASTOR, the effort to migrate the current experiment workflows to the new CERN Tape Archive system (CTA) is underway. Ceph infrastructure has operated for more than 5 years to provide block storage to CERN IT private OpenStack cloud, a shared filesystem (CephFS) to HPC clusters and NFS storage to replace commercial Filers. S3 service was introduced in 2018, following increased user requirements for S3-compatible object storage from physics experiments and IT use-cases. Since its introduction in 2014N, CERNBox has become a ubiquitous cloud storage interface for all CERN user groups: physicists, engineers and administration. CERNBox provides easy access to multi-petabyte data stores from a multitude of mobile and desktop devices and all mainstream, modern operating systems (Linux, Windows, macOS, Android, iOS). CERNBox provides synchronized storage for end-user’s devices as well as easy sharing for individual users and e-groups. CERNBox has also become a storage platform to host online applications to process the data such as SWAN (Service for Web-based Analysis) as well as file editors such as Collabora Online, Only Office, Draw.IO and more. An increasing number of online applications in the Windows infrastructure uses CIFS/SMB access to CERNBox files. CVMFS provides software repositories for all experiments across the WLCG infrastructure and has recently been optimized to efficiently handle nightlybuilds. While AFS continues to provide general-purpose filesystem for internal CERN users, especially as $HOME login area on central computing infrastructure, the migration of project and web spaces has significantly advanced. In this paper, we report on the experiences from the last year of LHC RUN2 data taking and evolution of our services in the past year.. We will highlight upcoming changes and future improvements and challenges.
CERNBox is the cloud collaboration hub at CERN. The service has more than 37,000 user accounts. The backup of user and project spaces data is critical for the service. The underlying storage system hosts over a billion files which amount to 12PB of storage distributed over thousands of disks with a tworeplica layout. Performing a backup operation over this vast amount of data and number of files is a non-trivial task. The original CERNBox backup system (an in-house event-driven file-level system) has been reconsidered and replaced by a new distributed and scalable backup infrastructure based on the open source tool RESTIC. The new system, codenamed cback, provides features needed in the HEP community to guarantee data safety and smooth operation from the system administrators. Daily snapshot-based backups of all our user and project areas along with automatic verification and restores are possible with this the new development. The backup data is also de-duplicated in blocks and stored as objects in a disk-based S3 cluster in another geographical location on the CERN campus, reducing storage costs and protecting critical data from major catastrophic events. We report on the design and operational experience of running the system and future improvement possibilities.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.