Vegetation phenology and its spatiotemporal driving factors are essential to reflect global climate change, the surface carbon cycle and regional ecology, and further quantitative studies on spatiotemporal heterogeneity and its two-way driving are needed. Based on MODIS phenology, meteorology, land cover and other data from 2001 to 2019, this paper analyzes the phenology change characteristics of the Yangtze River Delta from three dimensions: time, plane space and elevation. Then, the spatiotemporal heterogeneity of phenology and its driving factors are explored with random forest and geographic detector methods. The results show that (1) the advance of start of season (SOS) is insignificant—with 0.17 days per year; the end of season (EOS) shows a significant delay—0.48 days per year. The preseason temperature has a greater contribution to SOS, while preseason precipitation is main factor in determining EOS. (2) Spatial differences of the phenological index do not strictly obey the change rules of latitude at a provincial scale. The SOS of Jiangsu and Anhui is earlier than that of Zhejiang and Shanghai, and EOS shows an obvious double-clustering phenomenon. In addition, a divergent response of EOS with elevation grades is found; the most significant changes are observed at grades below 100 m. (3) Land cover (LC) type is a major factor of the spatial heterogeneity of phenology, and its change may also be one of the insignificant factors driving the interannual change of phenology. Furthermore, nighttime land surface temperature (NLST) has a relatively larger contribution to the spatial heterogeneity in non-core urban areas, but population density (PD) contributes little. These findings could provide a new perspective on phenology and its complex interactions between natural or anthropogenic factors.