We present a test of different error estimators for two-point clustering statistics, appropriate for present and future large galaxy redshift surveys. Using an ensemble of very large dark matter CDM N-body simulations, we compare internal error estimators (jackknife and bootstrap) to external ones (Monte Carlo realizations). For three-dimensional clustering statistics, we find that none of the internal error methods investigated is able to reproduce either accurately or robustly the errors of external estimators on 1 to 25 h −1 Mpc scales. The standard bootstrap overestimates the variance of ξ (s) by ∼40 per cent on all scales probed, but recovers, in a robust fashion, the principal eigenvectors of the underlying covariance matrix. The jackknife returns the correct variance on large scales, but significantly overestimates it on smaller scales. This scale dependence in the jackknife affects the recovered eigenvectors, which tend to disagree on small scales with the external estimates. Our results have important implications for fitting models to galaxy clustering measurements. For example, in a two-parameter fit to the projected correlation function, we find that the standard bootstrap systematically overestimates the 95 per cent confidence interval, while the jackknife method remains biased, but to a lesser extent. Ignoring the systematic bias, the scatter between realizations, for Gaussian statistics, implies that a 2σ confidence interval, as inferred from an internal estimator, corresponds in practice to anything from 1σ to 3σ . By oversampling the subvolumes, we find that it is possible, at least for the cases we consider, to obtain robust bootstrap variances and confidence intervals that agree with external error estimates. Our results are applicable to two-point statistics, like ξ (s) and w p (r p ), measured in large redshift surveys, and show that the interpretation of clustering measurements with internally estimated errors should be treated with caution.