Most urban tree inventories depend on resource-intensive, field-based assessments, which are unevenly distributed in space and time. Recently, these inventories have been conducted using field inventories combined with airborne multispectral, hyperspectral, LiDAR, and spaceborne multispectral remote sensing. Significant advances have been made in urban tree GIS databases and remote sensing methods, which include delineating individual tree crowns, extracting tree species metrics, and employing classification techniques. Generally, remote sensing methods distinguish individual urban trees using either pixel-based or object-based methods, while image classification procedures are typically divided into parametric (e.g., regression-based classification, Bayesian, and principal component analysis) and non-parametric approaches such as machine learning (e.g., random forests support vector machines) and deep learning (e.g., convolutional neural networks). Our synthesis of the current state of science suggests sensors with the highest spatial (m), spectral (bands), and temporal (repeat time) resolutions result in the most accurate tree species identification. Combining airborne LiDAR/hyperspectral or airborne LiDAR/spaceborne high-resolution multispectral sensors yields the highest accuracy for the most diverse urban forests. An object-based non-parametric approach, like a fully convolutional neural network, scores higher in accuracy assessments than pixel-based parametric approaches. Future studies can leverage global/regional GIS field inventory databases to expand the scope of studies within and across multiple cities, utilizing LiDAR and spaceborne sensors.