Abstract. The CMS Dataset Bookkeeping Service (DBS) has been developed to catalog all CMS event data from Monte Carlo and Detector sources. It provides the ability to identify MC or trigger source, track data provenance, construct datasets for analysis, and discover interesting data. CMS requires processing and analysis activities at various service levels and the DBS system provides support for localized processing or private analysis, as well as global access for CMS users at large. Catalog entries can be moved among the various service levels with a simple set of migration tools, thus forming a loose federation of databases. DBS is available to CMS users via a Python API, Command Line, and a Discovery web page interfaces. The system is built as a multi-tier web application with Java servlets running under Tomcat, with connections via JDBC to Oracle or MySQL database backends. Clients connect to the service through HTTP or HTTPS with authentication provided by GRID certificates and authorization through VOMS. DBS is an integral part of the overall CMS Data Management and Workflow Management systems.
CMS expects to manage several Pbytes of data each year, distributing them over many computing sites around the world and enabling data access at those centers for analysis. CMS has identified the distributed sites as the primary location for physics analysis to support a wide community of users, with potentially as many as 3000 users. This represents an unprecedented scale of distributed computing resources and number of users. An overview of the computing architecture, the software tools and the distributed infrastructure deployed is reported. Summaries of the experience in establishing efficient and scalable operations to prepare for CMS distributed analysis are presented, followed by the user experience in their current analysis activities.JournalofGridComputing manuscript No. (will be inserted by the editor) Abstract CMS expects to manage several Pbytes of data each year, distributing them over many computing sites around the world and enabling data access at those centers for analysis. CMS has identified the distributed sites as the primary location for physics analysis to support a wide community of users, with potentially as many as 3000 users. This represents an unprecedented scale of distributed computing resources and number of users. An overview of the computing architecture, the software tools and the distributed infrastructure deployed is reported. Summaries of the experience in establishing efficient and scalable operations to prepare for CMS distributed analysis are presented, followed by the user experience in their current analysis activities. Distributed Analysis in CMS
The CMS experiment at the CERN LHC developed the Workflow Management Archive system to persistently store unstructured framework job report documents produced by distributed workflow management agents. In this paper we present its architecture, implementation, deployment, and integration with the CMS and CERN computing infrastructures, such as central HDFS and Hadoop Spark cluster. The system leverages modern technologies such as a document oriented database and the Hadoop eco-system to provide the necessary flexibility to reliably process, store, and aggregate O(1M) documents on a daily basis. We describe the data transformation, the short and long term storage layers, the query language, along with the aggregation pipeline developed to visualize various performance metrics to assist CMS data operators in assessing the performance of the CMS computing system.
No abstract
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.