Determining whether different items provide the same information or mean the same thing within a population is a central concern when determining whether different scales or constructs are overlapping or redundant. In the present study, we suggest that retest-adjusted correlations provide a valuable means of adjusting for item-level unreliability. More exactly, we suggest dividing the estimated correlation between items X and Y measured over measurement interval |d| by the average retest correlations of the items over the same measurement interval. For instance, if we correlate scores from items X and Y measured 1 week apart, their retest-adjusted correlation is estimated by using their 1-week retest correlations. Using data from four inventories, we provide evidence that retest-adjusted correlations are significantly better predictors of whether two items are consensually regarded as “meaning the same thing” by judges than raw-score correlations. The results may provide the first empirical evidence that Spearman’s (1904, 1910) suggested reliability adjustment do—in certain (perhaps very constrained!) circumstances—improve upon raw-score correlations as indicators of the informational or semantic equivalence of different tests.