Critical thinking is a common aim for higher education students, often described as general competencies to be acquired through entire programs as well as domain-specific skills to be acquired within subjects. The aim of the study was to investigate whether statistics-specific critical thinking changed from the start of the first semester to the start of the second semester of a two-semester statistics course, where the curriculum contains learning objectives and assessment criteria related to critical thinking. The brief version of the Critical Thinking scale (CTh) from the Motivated Strategies of Learning Questionnaire addresses the core aspects of critical thinking common to three different definitions of critical thinking. Students rate item statements in relation to their statistics course using a frequency scale: 1 = never, 2 = rarely, 3 = sometimes, 4 = often, and 5 = always. Participants were two consecutive year-cohorts of full-time Bachelor of Psychology students taking a two-semester long statistics course placed in the first two semesters. Data were collected in class with a paper-pencil survey 1 month into their first semester and again 1 month into the second. The study sample consisted of 336 students (ncohort 1 = 166, ncohort 2 = 170) at baseline, the follow-up was completed by 270 students with 165 students who could be matched to their baseline response. To investigate the measurement properties of the CTh scale, item analysis by the Rasch model was conducted on baseline data and subsequently on follow-up data. Change scores at the group level were calculated as the standardized effect size (ES) (i.e., the difference between baseline and follow-up scores relative to the standard deviation of the baseline scores). Data fitted Rasch models at baseline and follow-up. The targeting of the CTh scale to the student sample was excellent at both timepoints. Absolute individual changes on the CTh ranged from −5.3 to 5.1 points, thus showing large individual changes in critical thinking. The overall standardized effect was small and negative (−0.12), with some variation in student strata defined by, gender, age, perceived adequacy of math knowledge to learn statistics, and expectation to need statistics in future employment.