IntroductionA recent paper by Kaplan and Su (2016) investigated the problem of matrix sampling of context questionnaires with respect to the generation of the plausible values (PVs) of the so-called "cognitive" tests in large-scale educational assessments. Drawing on earlier work by Adams et al. (2013) based on PISA 2012 OECD (2014) and motivated by the desire among policy-makers to increase non-cognitive content in national and international large-scale assessments, Kaplan and Su found that matrix sampling of context questionnaire (CQ) material followed by predictive mean matching imputation can Abstract Background: This paper extends a recent study by Kaplan and Su (J Educ Behav Stat 41: 51-80, 2016) examining the problem of matrix sampling of context questionnaire scales with respect to the generation of plausible values of cognitive outcomes in large-scale assessments. Methods: Following Weirich et al. (Nested multiple imputation in large-scale assessments.In: Large-scale assessments in education, 2. http://www.large scale asses sment sined ucati on.com/conte nt/2/1/9, 2014) we examine single + multiple imputation and multiple + multiple imputation methods using predictive mean matching imputation under three different context questionnaire matrix sampling designs: a twoform design studied by Adams et al. (On the use of rotated context questionnaires in conjunction with multilevel item response models. In: Large-scale assessments in education. http://www.large scale asses sment sined ucati on.com/conte nt/1/1/5, 2013), a three-form design implemented in PISA 2012, and a partially-balanced incomplete design studied by Kaplan and Su (J Educ Behav Stat 41: 51-80, 2016).
Results:Our results show that the choice of design has a larger impact on the reduction of bias than the choice of imputation method. Specifically, the three-form design used in PISA 2012 yields considerably less bias compared to the two-form design and the partially balanced incomplete design. We further show that the partially balanced incomplete block design produces less bias than the two-form design despite having the same amount of missing data.
Conclusions:We discuss the results in terms of implications for the design of context questionnaires in large-scale assessments.
Open Access© The Author(s) 2018. This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creat iveco mmons .org/licen ses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.