This article examines the interdependency of two context effects that are known to occur regularly in large-scale assessments: item position effects and effects of test-taking effort on the probability of correctly answering an item. A microlongitudinal design was used to measure test-taking effort over the course of a large-scale assessment of 60 min. Two components of test-taking effort were investigated: initial effort and change in effort. Both components of test-taking effort significantly affected the probability to solve an item. In addition, it was found that participants' current test-taking effort diminished considerably across the course of the test. Furthermore, a substantial linear position effect was found, which indicated that item difficulty increased during the test. This position effect varied considerably across persons. Concerning the interplay of position effects and test-taking effort, it was found that only the change in effort moderates the position effect and that persons differ with respect to this moderation effect. The consequences of these results concerning the reliability and validity of large-scale assessments are discussed.
Background: Low-stakes assessments do not have consequences for the test-takers. Currently, motivational research indicates that a lack of test-taking motivation can decrease students' performance in low-stakes assessments. However, little research has explored the domain-specific and situation-specific aspects of motivation simultaneously. Research examining differences in test-taking motivation among students in different types of schools is also limited. Our study therefore addressed the motivational determinants of test performance in low-stakes assessments, in general, as well as school-track-specific differences in particular. Method: Drawing on national data from students who participated in a cross-national study of educational achievement, we conducted multiple regression analyses to predict the students' test performance and the effort they invested in that test. We conducted the analyses for the entire sample as well as for the students in that sample separated according to the school track they were attending.
Trend estimation in international comparative large-scale assessments relies on measurement invariance between countries. However, cross-national differential item functioning (DIF) has been repeatedly documented. We ran a simulation study using national item parameters, which required trends to be computed separately for each country, to compare trend estimation performances to two linking methods employing international item parameters across several conditions. The trend estimates based on the national item parameters were more accurate than the trend estimates based on the international item parameters when cross-national DIF was present. Moreover, the use of fixed common item parameter calibrations led to biased trend estimates. The detection and elimination of DIF can reduce this bias but is also likely to increase the total error.
152
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.