Increasingly available spaceborne sensors provide unprecedented opportunities for large‐scale, timely and continuous tree species diversity (TSD) monitoring. However, given differences in spectral and spatial resolutions, the choice of sensor is not always straightforward. In this work, we investigated the effects of spatial and spectral resolutions for four spaceborne sensors (RapidEye, Landsat‐8, Sentinel‐2 and PlanetScope) on TSD mapping in an area of approximately 4000 km2 within the Black Forest, Germany. We employed a random forest (RF) regression model to predict Shannon–Wiener diversity based on seven types of spectral heterogeneity metrics (texture, coefficient of variation, Rao's Q, convex hull volume, spectral angle mapper, convex hull area and spectral species diversity) and a full survey dataset from 135 one‐ha sample plots. We compared the RF model's performance across sensors and spatial resolutions. Our results demonstrated that the Sentinel‐2‐based TSD model achieved the highest accuracy (mean R2: 0.477, mean root‐mean‐square error (RMSE): 0.274). The RapidEye‐based TSD model produced lower accuracy (mean R2: 0.346, mean RMSE: 0.303), but it was better than the PlanetScope‐ and Landsat‐based TSD models. The 10 m (for Sentinel‐2 and RapidEye) and 15 m (for PlanetScope) were the best spatial resolutions for predicting TSD. The NIR band was the most favourable spectral band for predicting TSD. Texture metrics and Rao's Q outperformed the other spectral heterogeneity metrics. Our results highlighted that spaceborne optical imagery (especially Sentinel‐2) can be successfully used for large‐scale TSD mapping but that the choice of sensors can significantly affect the resulting mapping accuracy in temperate montane forests.