medRxiv preprintDuring the study of epidemics, one of the most significant and also challenging problems is to forecast the future trends, on which all follow-up actions of individuals and governments heavily rely. However, to pick out a reliable predictable model/method is far from simple, a rational evaluation of various possible choices is eagerly needed, especially under the severe threat of COVID-19 epidemics which is spreading worldwide right now.In this paper, based on the public COVID-19 data of seven provinces/cities in China reported during the spring of 2020, we make a systematical investigation on the forecast ability of eight widely used empirical functions, four statistical inference methods and five dynamical models widely used in the literature. We highlight the significance of a well balance between model complexity and accuracy, over-fitting and under-fitting, as well as model robustness and sensitivity. We further introduce the Akaike information criterion, root mean square errors and robustness index to quantify these three golden means and to evaluate various epidemic models/methods.Through extensive simulations, we find that the inflection point plays a crucial role in the choice of the size of dataset in forecasting. Before the inflection point, no model considered here could make a reliable prediction. We further notice the Logistic function steadily underestimate the final epidemic size, while the Gomertz's function makes an overestimation in all cases. Since the methods of sequential Bayesian and time-dependent reproduction number take the non-constant nature of the effective reproduction number with the progression of epidemics into consideration, we suggest to employ them especially in the late stage of an epidemic. The transition-like behavior of exponential growth method from underestimation to overestimation with respect to the inflection point might be useful for constructing a more reliable forecast.Towards the dynamical models based on ODEs, it is observed that the SEIR-QD and SEIR-PO models generally show a better performance than SIR, SEIR and SEIR-AHQ models on the COVID-19 epidemics, whose success could be attributed to the inclusion of self-protection and quarantine, and a proper trade-off between model complexity and fitting accuracy.