Vegetation phenology and productivity play a crucial role in surface energy balance, plant and animal distribution, and animal movement and habitat use and can be measured with remote sensing metrics including start of season (SOS), peak instantaneous rate of green-up date (PIRGd), peak of season (POS), end of season (EOS), and integrated vegetation indices. However, for most metrics, we do not yet understand the agreement of remotely sensed data products with near-surface observations. We also need summaries of changes over time, spatial distribution, variability, and consistency in remote sensing dataset metrics for vegetation timing and quality. We compare metrics from 10 leading remote sensing datasets against a network of PhenoCam near-surface cameras throughout the western United States from 2002 to 2014. Most phenology metrics representing a date (SOS, PIRGd, POS, and EOS), rather than a duration (length of spring, length of growing season), better agreed with near-surface metrics but results varied by dataset, metric, and land cover, with absolute value of mean bias ranging from 0.38 (PIRGd) to 37.92 days (EOS). Datasets had higher agreement with PhenoCam metrics in shrublands, grasslands, and deciduous forests than in evergreen forests. Phenology metrics had higher agreement than productivity metrics, aside from a few datasets in deciduous forests. Using two datasets covering the period 1982–2016 that best agreed with PhenoCam metrics, we analyzed changes over time to growing seasons. Both datasets exhibited substantial spatial heterogeneity in the direction of phenology trends. Variability of metrics increased over time in some areas, particularly in the Southwest. Approximately 60% of pixels had consistent trend direction between datasets for SOS, POS, and EOS, with the direction varying by location. In all ecoregions except Mediterranean California, EOS has become later. This study comprehensively compares remote sensing datasets across multiple growing season metrics and discusses considerations for applied users to inform their data choices.