The latest epidemiological studies have revealed that the adverse health effects of PM2.5 have impacts beyond respiratory and cardio-vascular diseases and also affect the development of the brain and metabolic diseases. The need for accurate and spatio-temporally resolved PM2.5 data has thus been substantiated. While the selective information provided by station measurements is mostly insufficient for area-wide monitoring, satellite data have been increasingly applied to comprehensively monitor PM2.5 distributions. Although the accuracy and reliability of satellite-based PM2.5 estimations have increased, most studies still rely on a single sensor. However, several datasets have become available in the meantime, which raises the need for a systematic analysis. This study presents the first systematic evaluation of four satellite-based AOD datasets obtained from different sensors and retrieval methodologies to derive ground-level PM2.5 concentrations. We apply a random forest approach and analyze the effect of the resolution and coverage of the satellite data and the impact of proxy data on the performance. We examine AOD data from the Moderate resolution Imaging spectroradiometer (MODIS) onboard Terra and Aqua satellites, including Dark Target (DT) algorithm products and the Multi-Angle Implementation of Atmospheric Correction (MAIAC) product. Additionally, we explore more recent datasets from the Sea and Land Surface Temperature Radiometer (SLSTR) onboard Sentinel-3a and from the Tropospheric Monitoring Instrument (TROPOMI) operating on the Sentinel-5 precursor (S5p). The method is demonstrated for Germany and the year 2018, where a dense in situ measurement network and relevant proxy data are available. Overall, the model performance is satisfactory for all four datasets with cross-validated R2 values ranging from 0.68 to 0.77 and excellent for MODIS AOD reaching correlations of almost 0.9. We find a strong dependency of the model performance on the coverage and resolution of the AOD training data. Feature importance rankings show that AOD has less weight compared to proxy data for SLSTR and TROPOMI.