Measures of cognitive or socio-emotional skills from large-scale assessments surveys (LSAS) are often based on advanced statistical models and scoring techniques unfamiliar to applied researchers. Consequently, applied researchers working with data from LSAS may be uncertain about the assumptions and computational details of these statistical models and scoring techniques and about how to best incorporate the resulting skill measures in secondary analyses. The present paper is intended as a primer for applied researchers. After a brief introduction to the key properties of skill assessments, we give an overview over the three principal methods with which secondary analysts can incorporate skill measures from LSAS in their analyses: (1) as test scores (i.e., point estimates of individual ability), (2) through structural equation modeling (SEM), and (3) in the form of plausible values (PVs). We discuss the advantages and disadvantages of each method based on three criteria: fallibility (i.e., control for measurement error and unbiasedness), usability (i.e., ease of use in secondary analyses), and immutability (i.e., consistency of test scores, PVs, or measurement model parameters across different analyses and analysts). We show that although none of the methods are optimal under all criteria, methods that result in a single point estimate of each respondent’s ability (i.e., all types of “test scores”) are rarely optimal for research purposes. Instead, approaches that avoid or correct for measurement error—especially PV methodology—stand out as the method of choice. We conclude with practical recommendations for secondary analysts and data-producing organizations.
Researchers commonly evaluate the fit of latent-variable models by comparing canonical fit indices (χ2, CFI, RMSEA, SRMR) against fixed cutoffs derived from simulation studies. However, the performance of fit indices varies greatly across empirical settings, and fit indices are susceptible to extraneous influences other than model misspecification. This threatens the validity of model judgments using fixed cutoffs. As a solution, methodologists have proposed four principal approaches to tailor cutoffs and the set of fit indices to the specific empirical setting at hand, which we review here. Extending this line of research, we then introduce a refined approach that allows (1) generating tailored cutoffs while also (2) identifying well-performing fit indices in the given scenario. Our so-called simulation-cum-ROC approach combines a Monte Carlo simulation with receiver operating characteristic (ROC) analysis. The Monte Carlo simulation generates distributions of fit indices under different assumptions about the population model that may have generated the data. ROC analysis helps evaluate the performance of fit indices in terms of their ability to discriminate between correctly specified and misspecified analysis models and allows selecting well-performing ones. It further identifies cutoffs for these fit indices that minimize Type I and Type II errors. The simulation-cum-ROC approach provides an alternative to fixed cutoffs, allows for more valid decisions about accepting or rejecting a model, and improves prior approaches to tailored cutoffs. We provide a shiny app that makes the application of our approach easy.
This article addresses a fundamental question in the study of socio-emotional skills, personality traits, and related constructs: “To score or not to score?” When researchers use test scores or scale scores (i.e., fallible point estimates of a skill or trait) as predictors in multiple regression, measurement error in these scores tends to attenuate regression coefficients for the skill and inflate those of the covariates. Unlike for cognitive assessments, it is not fully established how severe this bias can be in socio-emotional skill assessments, that is, how well test scores recover the true regression coefficients — compared with methods designed to account for measurement error: structural equation modeling (SEM) and plausible values (PV). The different types of scores considered in this study are standardized mean scores (SMS), regression factor scores (RFS), empirical Bayes modal (EBM) score, weighted maximum likelihood estimates (WLE), and expected a posteriori (EAP) estimates. We present a simulation study in which we compared these approaches under conditions typical of socio-emotional skill and personality assessments. We examined the performance of five types of test scores, PV, and SEM with regard to two outcomes: (1) percent bias in regression coefficient of the skill in predicting an outcome; and (2) percent bias in the regression coefficient of a covariate. We varied the number of items, factor loadings/item discriminations, sample size, and relative strength of the relationship of the skill with the outcome. Results revealed that whereas different types of test scores were highly correlated with each other, the ensuing bias in regression coefficients varied considerably. The magnitude of bias was highest for WLE with short scales of low reliability. Bias when using SMS or WLE test scores was sometimes large enough to lead to erroneous research conclusions with potentially adverse implications for policy and practice (up to 55% for the regression coefficient of the skill and 20% for that of the covariate). EAP, EBM, and RFS performed better, producing only small bias in some conditions. Additional analyses showed that the performance of test scores also depended on whether standardized or unstandardized scores were used. Only PV and SEM performed well in all scenarios and emerged as the clearly superior options. We recommend that researchers use SEM, and preferably PV, in studies on the (incremental) predictive power of socio-emotional skills.
Teacher self-efficacy (TSE) is a frequently studied construct due to its positive relations with student outcomes. However, TSE of teachers in inclusive early childhood special education (ECSE) classrooms has seldom been studied. To fill this gap, we examined the extent to which (a) teachers exhibited differing levels of TSE for students; (b) children’s characteristics, particularly disability status and learning behaviors, were associated with TSE; and (c) relations between children’s characteristics and TSE remained consistent across an academic year. Thirty-seven teachers of inclusive ECSE classrooms completed surveys to ascertain their student-specific TSE and children’s learning behaviors for 114 children. Results indicated that teachers had different levels of TSE for students in their classrooms. Children’s characteristics, particularly their attention/persistence, were related to TSE, with more relations shown between TSE and children’s characteristics at the start of the school year than at the end. Implications for teacher professional development are discussed.
Multilevel modeling, also known as hierarchical or mixed effects modeling, is a statistical technique used for fitting nested or clustered data. Multilevel modeling aids in examining associations between variables measured at different levels of the data structure (Raudenbush & Bryk, 2002;Hox et al., 2018). In these models, residual components are introduced at each level of the structure. Multilevel modeling is a popular method of analysis and is widely used in social sciences (students nested within classrooms), natural sciences (burrowing habit across locations), physical sciences (distribution of light over the sky), and medicine (tracking drug doses over time). In recent years, there has been a lot of research on modeling clustered data for methods such as structural equation modeling (SEM), mediation analysis, mixture modeling, propensity score analysis to name a few. Due to the ubiquity of this method, all major statistical softwares offer implementations of multilevel modeling. Especially, R has become widely popular among researchers and teachers for fitting multilevel models. With the development of multilevel modeling in research and software, it is crucial that researchers acquaint themselves with the method and learn to implement it in R.There are several books on multilevel modeling that offer technical and non-technical treatment of the subject. Some books also offer guidance on fitting these models using different software. There are a couple of excellent books that include implementation of multilevel modeling in R-linear mixed effects models using R (Galecki & Burzykowski, 2013) and mixed effects models and extensions in Ecology with R (Zuur et al., 2009). Yet, this book is a welcome addition on the topic. The most important aspect of this book is the communication of the fundamentals of multilevel modeling and some of its advanced issues in a non-technical and easy to understand format. The comprehensive description of R functions and syntax makes it easier for a novice R user to fit multilevel models and understand the resulting output. The website associated with the first edition of the book (http://www.mlminr.com/) contains data sets used in the examples that can be downloaded for practice by the readers. The second edition of the book includes examples with newer packages, functions, model fit measures, and additional advanced topics, but the above-mentioned website has not been updated to accommodate the changes. However, the book contains all the necessary R codes and syntax to replicate the examples.Primary audience of the book: This book is mainly geared toward readers in need for an introduction to multilevel models as the book does not delve into advanced topics or technical details of the methodology. However, seasoned researchers and course instructors trying to include R implementations will also find this book to be useful. The book covers a broad range of themes and examples, making it easier for readers from varied background to follow the content.Structure of the book: The boo...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.