In psychological science, self-report scales are widely used to compare means in targeted latent constructs across time points, groups, or experimental conditions. For these scale mean comparisons (SMC) to be meaningful and unbiased, the scales should be measurement invariant across the compared time points or (experimental) groups. Measurement invariance (MI) testing checks whether the latent constructs are measured equivalently across groups or time points. Since MI is essential for meaningful comparisons, we conducted a systematic review to check whether MI is taken seriously in psychological research. Specifically, we sampled 426 psychology articles with openly available data that involved a total of 918 SMCs to (1) investigate common practices in conducting and reporting of MI testing, (2) check whether reported MI test results can be reproduced, and (3) conduct MI tests for the SMCs that enabled sufficiently powerful MI testing with the shared data. Our results indicate that (1) 4% of the 918 scales underwent MI testing across groups or time and that these tests were generally poorly reported, (2) none of the reported MI tests could be successfully reproduced, and (3) of 161 newly performed MI tests, a mere 46 (29%) reached sufficient MI (scalar invariance), and MI often failed completely (89; 55%). Thus, MI tests were rarely done and poorly reported in psychological studies, and the frequent violations of MI indicate that reported group differences cannot be solely attributed to group differences in the latent constructs. We offer recommendations on reporting MI tests and improving computational reproducibility practices.