GRACC: New generation of the OSG accounting

Retzke, K.; Weitzel, Derek; Bhat, Shreyas; Levshina, Tanya; Bockelman, Brian; Jayatilaka, B.; Sehgal, Chander; Quick, Rob; Wuerthwein, Frank

doi:10.1088/1742-6596/898/9/092044

Cited by 14 publications

(12 citation statements)

References 2 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As a next step, we started deploying HTTP via XRootD at four US CMS Tier 2 centers for production transfers to gain long term operational experience. Production transfers via this infrastructure are being tracked using GRACC [12]. Initial experience with this production deployment is shown in Figure 5.…”

Section: Production Deployments Of Http Via Xrootd At Us Cms Tier 2 Cmentioning

confidence: 99%

Testing the limits of HTTPS single point third party copy transfer over the WAN

2020

View full text Add to dashboard Cite

LHC data is constantly being moved between computing and storage sites to support analysis, processing, and simulation; this is done at a scale that is currently unique within the science community. For example, the CMS experiment on the LHC manages approximately 200PB of storage across 100 sites and, on a daily basis, moves 1PB between sites via GridFTP as primary protocol. This paper describes the performance results we have achieved by exploring alternatives to the GridFTP protocol for these data movements. In particular the HTTPS third party copy over Xrootd data servers as a possible replacement of GridFTP for LHC big data movements.

show abstract

Section: Production Deployments Of Http Via Xrootd At Us Cms Tier 2 Cmentioning

confidence: 99%

Testing the limits of HTTPS single point third party copy transfer over the WAN

2020

View full text Add to dashboard Cite

show abstract

“…Many systems [1][2][3][4][5] are developed to account resource usage. These systems have similar functional components but different implementations.…”

Section: Related Workmentioning

confidence: 99%

Cosmos : A Unified Accounting System both for the HTCondor and Slurm Clusters at IHEP

Shi

Jiang

et al. 2020

EPJ Web Conf.

View full text Add to dashboard Cite

HTCondor was adopted to manage the High Throughput Computing (HTC) cluster at IHEP in 2016. In 2017 a Slurm cluster was set up to run High Performance Computing (HPC) jobs. To provide accounting services for these two clusters, we implemented a unified accounting system named Cosmos. Multiple workloads bring different accounting requirements. Briefly speaking, there are four types of jobs to account. First of all, 30 million single-core jobs run in the HTCondor cluster every year. Secondly, Virtual Machine (VM) jobs run in the legacy HTCondor VM cluster. Thirdly, parallel jobs run in the Slurm cluster, and some of these jobs are run on the GPU worker nodes to accelerate computing. Lastly, some selected HTC jobs are migrated from the HTCondor cluster to the Slurm cluster for research purposes. To satisfy all the mentioned requirements, Cosmos is implemented with four layers: acquisition, integration, statistics and presentation. Details about the issues and solutions of each layer will be presented in the paper. Cosmos has run in production for two years, and the status shows that it is a well-functioning system, also meets the requirements of the HTCondor and Slurm clusters.

show abstract

“…However, further work had to be done to take the new monitoring a step further and provide a way for data to be pushed to more than one backend. The Grid Accounting Collector (GRACC) [7] is the official monitoring Elasticsearch instance for OSG. It is a natural place to also back up all factory statistics.…”

Section: New Monitoring: Versionmentioning

confidence: 99%

Factory Monitoring for the 21st Century

et al. 2019

View full text Add to dashboard Cite

A key aspect of pilot-based grid operations are the GlideinWMS pilot factories. A proper and efficient use of any central block in the grid infrastructure for operations is inevitable, and GlideinWMS factories are no exception. The monitoring package for the GlideinWMS factory was originally developed when the factories were serving a couple of VOs and tens of sites. Today with the factories serving tens of VOs and hundreds of sites around the globe an update of the monitoring is due. Moreover with the new availability of industry open source storage and graphing packages an opportunity remains open. This work presents the changes made to the factory monitoring to leverage different technologies: Elasticsearch, RabbitMQ, Grafana, and InfluxDB to provide a centralized view of the status and work of several GlideinWMS factories located in different continents around the globe. *

show abstract

GRACC: New generation of the OSG accounting

Cited by 14 publications

References 2 publications

Testing the limits of HTTPS single point third party copy transfer over the WAN

Testing the limits of HTTPS single point third party copy transfer over the WAN

Cosmos : A Unified Accounting System both for the HTCondor and Slurm Clusters at IHEP

Factory Monitoring for the 21st Century

Contact Info

Product

Resources

About