Combining data from different studies has a long tradition within the scientific community. It requires that the same information is collected from each study to be able to pool individual data. When studies have implemented different methods or used different instruments (e.g., questionnaires) for measuring the same characteristics or constructs, the observed variables need to be harmonized in some way to obtain equivalent content information across studies. This paper formulates the main concepts for harmonizing test scores from different observational studies in terms of latent variable models. The concepts are formulated in terms of calibration, invariance, and exchangeability. Although similar ideas are present in measurement reliability and test equating, harmonization is different from measurement invariance and generalizes test equating. In addition, if a test score needs to be transformed to another test score, harmonization of variables is only possible under specific conditions. Observed test scores that connect all of the different studies, are necessary to be able to test the underlying assumptions of harmonization. The concepts of harmonization are illustrated on multiple memory test scores from three different Canadian studies.