Massive Multiple Input Multiple Output (MIMO) is one of the key technologies in 5G, and it is envisioned to have superior spectral and energy efficiencies. This paper is the first to evaluate Massive MIMO in realistic performance metrics in heterogeneous urban environments, i.e. 20 Macrocells and 20 Picocells, providing cellular services in the city of Bristol (UK). We base our study on a 3D ray-tracing propagation channel model that uses real city maps. We also convolve our channel model with individual 3D complex polarimetric antenna radiation patterns for both base station (BS) and User Equipment (UE). We consider a system configuration with 128 elements at the BS and up to 16 receive terminals (i.e. 16 singleantenna UEs or 8 dual-antenna UEs). Eigen-beamforming precoding and a Received Bit-level mutual Information Rate (RBIR) based abstraction simulator are used on a system level. Millions of cellular links were simulated to ensure statistically relevant results. We quantify the realistically achievable capacity in terms of cell size, number of user terminals, and rank of the users, as well as the gain over traditional 4G Long-Term Evolution (LTE) networks. Overall, 128Tx-16Rx Massive MIMO (with rank-2 UEs) was found to provide up to 434% and 478% more capacity over traditional LTE Single-User MIMO with 8Tx-8Rx configuration in Macrocells and Picocells respectively.