Researchers often compare groups of individuals on psychological variables. When comparing groups an assumption is made that the instrument measures the same psychological construct in all groups. If this assumption holds, the comparisons are valid and differences/similarities between groups can be meaningfully interpreted. If this assumption does not hold, comparisons and interpretations are not fully meaningful. The establishment of measurement invariance is a prerequisite for meaningful comparisons across groups. This paper first reviews the importance of equivalence in psychological research, and then the main theoretical and methodological issues regarding measurement invariance within the framework of confirmatory factor analysis. A step-by-step empirical example of measurement invariance testing is provided along with syntax examples for fitting such models in LISREL.Key words: measurement invariance, cross-cultural research, confirmatory factor analysis, LISREL.
RESUMENLos investigadores a menudo comparan grupos de individuos en diferentes variables psicológicas. Cuando se comparan grupos se asume que el instrumento usado para la medición da cuenta de los mismos constructos psicológicos en todos los grupos. Si tal suposición es cierta, las comparaciones son válidas y las diferencias/similitudes entre los grupos pueden ser interpretadas apropiadamente. Si tal suposición no es cierta, las comparaciones e interpretaciones pierden validez. El establecimiento de la invariancia en las mediciones es un prerrequisito esencial para lograr comparaciones apropiadas entre grupos. En este artículo se presenta primero la importancia de la invariancia en investigación psicológica y luego se presentan asuntos teóricos y metodológicos en relación con la invariancia en las mediciones dentro del marco del análisis factorial confirmatorio. Se presenta un ejemplo en LISREL que ejemplifica la prueba de invariancia de mediciones.Palabras clave: invariancia en las mediciones, investigación transcultural, análisis factorial confirmatorio, LISREL
Author ContributionsCGS and FKB contributed equally to the article; they conceptualized and designed the study and planned the pre-registration. CGS processed the data and conducted the analyses. FKB with LMG and NCO conducted the literature review and drafted the manuscript, with significant input from CGS, NS, MSW, CHJL, PM, JB, DO, TLM, CAH, IMD, and RVJ. NS and LMG prepared the figures.
We conducted preregistered replications of 28 classic and contemporary published findings, with protocols that were peer reviewed in advance, to examine variation in effect magnitudes across samples and settings. Each protocol was administered to approximately half of 125 samples that comprised 15,305 participants from 36 countries and territories. Using the conventional criterion of statistical significance ( p < .05), we found that 15 (54%) of the replications provided evidence of a statistically significant effect in the same direction as the original finding. With a strict significance criterion ( p < .0001), 14 (50%) of the replications still provided such evidence, a reflection of the extremely high-powered design. Seven (25%) of the replications yielded effect sizes larger than the original ones, and 21 (75%) yielded effect sizes smaller than the original ones. The median comparable Cohen’s ds were 0.60 for the original findings and 0.15 for the replications. The effect sizes were small (< 0.20) in 16 of the replications (57%), and 9 effects (32%) were in the direction opposite the direction of the original effect. Across settings, the Q statistic indicated significant heterogeneity in 11 (39%) of the replication effects, and most of those were among the findings with the largest overall effect sizes; only 1 effect that was near zero in the aggregate showed significant heterogeneity according to this measure. Only 1 effect had a tau value greater than .20, an indication of moderate heterogeneity. Eight others had tau values near or slightly above .10, an indication of slight heterogeneity. Moderation tests indicated that very little heterogeneity was attributable to the order in which the tasks were performed or whether the tasks were administered in lab versus online. Exploratory comparisons revealed little heterogeneity between Western, educated, industrialized, rich, and democratic (WEIRD) cultures and less WEIRD cultures (i.e., cultures with relatively high and low WEIRDness scores, respectively). Cumulatively, variability in the observed effect sizes was attributable more to the effect being studied than to the sample or setting in which it was studied.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.