Abstract. The operation of the CMS computing system requires a complex monitoring system to cover all its aspects: central services, databases, the distributed computing infrastructure, production and analysis workflows, the global overview of the CMS computing activities and the related historical information. Several tools are available to provide this information, developed both inside and outside of the collaboration and often used in common with other experiments. Despite the fact that the current monitoring allowed CMS to successfully perform its computing operations, an evolution of the system is clearly required, to adapt to the recent changes in the data and workload management tools and models and to address some shortcomings that make its usage less than optimal. Therefore, a recent and ongoing coordinated effort was started in CMS, aiming at improving the entire monitoring system by identifying its weaknesses and the new requirements from the stakeholders, rationalise and streamline existing components and drive future software development. This contribution gives a complete overview of the CMS monitoring system and a description of all the recent activities that have been started with the goal of providing a more integrated, modern and functional global monitoring system for computing operations.
IntroductionThe CMS computing system is operating since 2008, much before the start of LHC proton-proton data taking. Its role is to enable the collaboration to process, store, distribute and analyse data collected by the CMS experiment. Like for the other LHC experiments, it is built over the infrastructure and the services provided by the Worldwide LHC Computing Grid (WLCG) as an upper layer of services and tools that take into account the CMS data model and implement the CMS computing model.The operation of such complex system requires substantial effort from CMS, from WLCG and the sites; it is therefore of utmost importance to be able to know in detail the system status and its history at all times, as the availability of good monitoring information is a prerequisite for efficient operations. In November 2010 CMS critically reviewed all the available monitoring with the goal of identifying its weaknesses and finding ways to improve it.The purpose of this paper is i) to give an overview of the CMS monitoring system, the areas covered by it, the existing tools and the known deficiencies of the system, ii) to describe the process pursued by CMS to achieve a global monitoring system based on a limited number of common tools and iii) to talk about the ongoing developments along the lines defined by the global strategy.