Cities can be observed through a broad set of sensing technologies, spanning from physical sensors in the streets, to socioeconomic reports, to other kinds of sources that are able to represent the behaviour of the citizens and visitors, such as mobile phone records, social media posts, and other digital traces. In this paper, we propose a conceptual framework for putting at use this variety of Big Data sources, with a unified approach that applies spatial and temporal analysis over heterogeneous streams of data. We define spatial analysis based on conceptual grids (made of cells) over the city space, and then we study: the time series of signals both at grid and cell level; the correlation across signals and across cells; the prediction of city dynamics based on multiple signals; and the identifications of anomalies based on the difference between the observed dynamics and their prediction. To implement this model we propose a general architectural framework that uses Big Data technologies (such as HDFS, YARN, HIVE, PIG, Cascalog, Spark, Spark SQL, Spark Streaming and SparkR) and can be deployed in different configurations based on different needs. By taking an inherent data science approach to the problem we are able to address at scale: technical problems such as heterogeneous time and space granularity of the data, as well as appropriate interpretation of the results through tools that enable intuitive and immediate visual perception of emerging patterns and dynamics. We demonstrate feasibility, generality and effectiveness of our Urban Data Science at scale approach through multiple use cases and examples taken from real-world requirements collected in various cities and accounting for diverse business and city needs.