Abstract. Performance criteria play a key role in the calibration and evaluation of hydrological models and have been extensively developed and studied, but some of the most used criteria still have unknown pitfalls. This study set out to examine counterbalancing errors, which are inherent to the Kling-Gupta Efficiency (KGE) and its variants. A total of nine performance criteria – including the KGE and its variants, as well as the Nash-Sutcliffe Efficiency (NSE) and the refined version of the Willmott’s index of agreement (dr) – were analysed using synthetic time series and a real case study. Results showed that, assessing a simulation, the score of the KGE and some of its variants can be increased by concurrent over- and underestimation of discharge. These counterbalancing errors may favour bias and variability parameters, therefore preserving an overall high score of the performance criteria. As bias and variability parameters generally account for 2/3 of the weight in the equation of performance criteria such as the KGE, this can lead to an overall higher criterion score without being associated to an increase in model relevance. We recommend using (i) performance criteria that are not or less prone to counterbalancing errors (NSE, dr, modified KGE, non-parametric KGE, Diagnostic Efficiency) in a multi-criteria framework, and/or (ii) scaling factors in the equation to reduce the influence of relative parameters.