We study the correlation between different sets of parton distributions (PDFs). Specifically, viewing different PDF sets as distinct determinations, generally correlated, of the same underlying physical quantity, we examine the extent to which the correlation between them is due to the underlying data. We do this both for pairs of PDF sets determined using a given fixed methodology, and between sets determined using different methodologies. We show that correlations have a sizable component that is not due to the underlying data, because the data do not determine the PDFs uniquely. We show that the data-driven correlations can be used to assess the efficiency of methodologies used for PDF determination. We also show that the use of data-driven correlations for the combination of different PDFs into a joint set can lead to inconsistent results, and thus that the statistical combination used in constructing the widely used PDF4LHC15 PDF set remains the most reliable method.