Abstract. Photochemical grid models (PGMs) are being applied more frequently to address diverse scientific and regulatory compliance associated with deteriorated air quality in China for the past decade. Solid evaluation of model performances guarantees the robustness and reliability of the baseline modelling results, so subsequent applications are built on top of it; thus, model performance evaluation (MPE) is a critical step of any PGM applications. MPE procedures and associated benchmarks have been proposed for PGM applications in the United States and Europe. However, with numerous input data needed, diverse model configurations, and evolution of the model itself, no two PGM applications are exactly the same. Therefore, those MPE benchmarks proposed based on studies outside China may not be appropriate for evaluation of the increasing number of PGM applications in China. Here we follow an established approach as published in previous literatures, to recommend statistical benchmarks for evaluation of simulated particulate matter (PM) concentrations in China. A total of 128 peer-reviewed articles published between 2006 and mid-2019 that applied one of four most frequently used PGMs in China are compiled to summarize operational model performance results. Quantile distributions of common statistical metrics are presented for total PM2.5 and speciated components. Influences of different model configurations, including modelling regions and seasons, spatial resolution of modelling grids, temporal resolution of MPE, etc., on the range of reported statistics are discussed. Benchmarks for four frequently used evaluation metrics are provided for two tiers – “goals” and “criteria”, where “goals” represent the best model performance that a model is currently expected to achieve and “criteria” represent the model performance that the majority (i.e. two thirds) of studies can meet. Our proposed benchmarks are further compared with those developed for United States and Europe. Additional recommendations for MPE practices are also given. Results from this study shall help the ever-growing modelling community in China to have a better objective assessment of how well their simulation results are compared with previous studies and to better demonstrate the credibility and robustness of their PGM applications prior to subsequent regulatory assessments.