With the development of the Internet of Things, millions of sensors are being deployed in cities to collect realtime data. This leads to a need for checking city states against city requirements at runtime. In this paper, we develop a novel spatial-temporal specification-based monitoring system for smart cities. We first describe a study of over 1,000 smart city requirements, some of which cannot be specified using existing logic such as Signal Temporal Logic (STL) and its variants. To tackle this limitation, we develop SaSTL-a novel Spatial Aggregation Signal Temporal Logic-for the efficient runtime monitoring of safety and performance requirements in smart cities. We develop two new logical operators in SaSTL to augment STL for expressing spatial aggregation and spatial counting characteristics that are commonly found in real city requirements. We define Boolean and quantitative semantics for SaSTL in support of the analysis of city performance across different periods and locations. We also develop efficient monitoring algorithms that can check a SaSTL requirement in parallel over multiple data streams (e.g., generated by multiple sensors distributed spatially in a city). Additionally, we build a SaSTL-based monitoring tool to support decision making of different stakeholders to specify and runtime monitor their requirements in smart cities. We evaluate our SaSTL monitor by applying it to three case studies with large-scale real city sensing data (e.g., up to 10,000 sensors in one study). The results show that SaSTL has a much higher coverage expressiveness than other spatial-temporal logics, and with a significant reduction of computation time for monitoring requirements. We also demonstrate that the SaSTL monitor improves the safety and performance of smart cities via simulated experiments.