Despite extensive studies focused on environmental tax revenue (ETR) on the driver and linkage with socioeconomic variables over time, an in-depth investigation on the spatiotemporal driver and intrinsic characteristics (e.g., convergence and complex network) is in need, providing valuable information on formulating better environmental tax policy towards sustainable development. Therefore, the study comprehensively analyzed the spatiotemporal driver, convergence trend, and complex network of provincial ETR in a case of China over 2000–2019 by using temporal and spatial logarithmic mean Divisia index models (LMDI), convergence models, and social network analysis, respectively. We found that, first, two convergence clubs of ETR for China’s provinces over the period were found. Second, GDP per capita and tax intensity were the positive and negative drivers contributing the increase in ETR. Third, within differences in tax intensity and GDP per capita, as well as the differences in population and GDP per capita, were the main drivers widening the overall ETR gap. Fourth, the original hierarchical ETR spatial correlation structure has changed, while provinces exhibited certain degrees of heterogeneity in terms of ETR spatial association network. The study highlights that ETR plays a significant role in maintaining sustainable development and thus suggests that more importance of environmental tax policies at various levels should be attached.