CMS computing operations during run 1

Giommi

et al. 2016

J. Phys.: Conf. Ser.

The CMS experiment at the LHC accelerator at CERN relies on its computing infrastructure to stay at the frontier of High Energy Physics, searching for new phenomena and making discoveries. Even though computing plays a significant role in physics analysis we rarely use its data to predict the system behavior itself. A basic information about computing resources, user activities and site utilization can be really useful for improving the throughput of the system and its management. In this paper, we discuss a first CMS analysis of dataset popularity based on CMS meta-data which can be used as a model for dynamic data placement and provide the foundation of data-driven approach for the CMS computing infrastructure. arXiv:1602.07226v1 [physics.data-an]

Section: Dataset Popularitymentioning

confidence: 99%

“…Finally, the Tier-3 centers (mostly at University levels) are used for various analysis tasks. Such distributed Computing Model is demonstrated to be reliable and flexible enough during Run-I operations, see [3].…”

Section: Introductionmentioning

confidence: 99%

Predicting dataset popularity for the CMS experiment

Giommi

et al. 2016

J. Phys.: Conf. Ser.

“…The pushed data are submitted as a bulk collection in certain time intervals defined by WMAgent configuration. 1 The received data are routed to internal short-term storage (STS) based on the document-oriented MongoDB database [7]. Then, a separate daemon reads data from STS, merges records together, converts the JSON data to Avro [10] data format, and writes Avro files to the local file system.…”

Section: System Architecturementioning

confidence: 99%

“…Finally, the Tier-3 centers (mostly at Universities) are used for various analysis tasks. More details about CMS Computing Model can be found elsewhere [1].…”

Section: Introductionmentioning

confidence: 99%

The Archive Solution for Distributed Workflow Management Agents of the CMS Experiment at LHC

Fischer

Guo

2018

Comput Softw Big Sci

The CMS experiment at the CERN LHC developed the Workflow Management Archive system to persistently store unstructured framework job report documents produced by distributed workflow management agents. In this paper we present its architecture, implementation, deployment, and integration with the CMS and CERN computing infrastructures, such as central HDFS and Hadoop Spark cluster. The system leverages modern technologies such as a document oriented database and the Hadoop eco-system to provide the necessary flexibility to reliably process, store, and aggregate O(1M) documents on a daily basis. We describe the data transformation, the short and long term storage layers, the query language, along with the aggregation pipeline developed to visualize various performance metrics to assist CMS data operators in assessing the performance of the CMS computing system.

“…Finally, Tier-3 centers (mostly at Universities) are used for various analysis tasks. More details about CMS Computing Model can be found elsewhere [1].…”

Section: Introductionmentioning

confidence: 99%

Correction to: The Archive Solution for Distributed Workflow Management Agents of the CMS Experiment at LHC

Fischer

Guo

2019

Comput Softw Big Sci

The CMS experiment at the CERN LHC developed the workflow management archive system to persistently store unstructured framework job report documents produced by distributed workflow management agents. In this paper, we present its architecture, implementation, deployment, and integration with the CMS and CERN computing infrastructures, such as central HDFS and Hadoop Spark cluster. The system leverages modern technologies such as a document-oriented database and the Hadoop ecosystem to provide the necessary flexibility to reliably process, store, and aggregate O(1M) documents on a daily basis. We describe the data transformation, the short-and long-term storage layers, and the query language, along with the aggregation pipeline developed to visualize various performance metrics, to assist CMS data operators in assessing the performance of the CMS computing system.