Confirming the result of a calculation by a calculation with a different method is often seen as a validity check. However, when the methods considered are all subject to the same (systematic) errors, this practice fails. Using a statistical approach, we define measures for reliability and similarity, and we explore the extent to which the similarity of results can help improve our judgment of the validity of data. This method is illustrated on synthetic data and applied to two benchmark datasets extracted from the literature: band gaps of solids estimated by various density functional approximations, and effective atomization energies estimated by ab initio and machine-learning methods. Depending on the levels of bias and correlation of the datasets, we found that similarity may provide a null-to-marginal improvement in reliability and was mostly effective in eliminating large errors.