Majority of high-performing off-policy reinforcement learning algorithms use aggregated overestimation bias control techniques.However, most of them rely on a pre-defined bias correction policies that are either not flexible enough or require environment-specific tuning of hyperparameter.In this work, we present a data-driven approach for automatic bias control.We demonstrate its effectiveness on three algorithms: Truncated Quantile Critics, Weighted Delayed DDPG and Maxmin Q-learning. Our approach eliminates the need for an extensive hyperparameter search.We show that it leads to the significant reduction of the actual number of interactions while, in most cases, matching the performance of a resource demanding grid search method.While on average the reduction of the bias improves the performance, elimination of the aggregated bias does not always lead to the best performance. To the best of our knowledge, that is the first case where it is proven on complex environments which highlights the important pitfalls of overestimation control.