The complex infrastructure, growing scale, and variety of a large-scale Cloud System (LCS) pose many challenges in monitoring the health of its components. Unfortunately, existing advanced monitoring systems often fail to assist the Cloud Operation Team in gaining meaningful insights about their system and its underlying components. In this thesis, we propose a near-real-time interactive visual monitoring tool based on heatmaps that help developers and maintainers of LCS to perform exploratory analysis of LCS health and aid in decision-making regarding resource planning and provisioning, configuration design, and problem identification. We have validated our tool in real-world settings by monitoring IBM Cloud Console (an LCS used by IBM to monitor IBM Cloud). Results show that our heatmaps can provide actionable insights. In particular, the tool has helped the team diagnose anomalous behaviour of the components, determine heavy or low traffic, find latency issues and make critical business decisions. Our tool is of interest to practitioners as it can be used to monitor the health of an arbitrary LCS. Moreover, it can serve as a building block for creating a theory of monitoring complex software systems, which is of interest to academics.