Bi-factor confirmatory factor models have been influential in research on cognitive abilities because they often better fit the data than correlated factors and higher-order models. They also instantiate a perspective that differs from that offered by other models. Motivated by previous work that hypothesized an inherent statistical bias of fit indices favoring the bi-factor model, we compared the fit of correlated factors, higher-order, and bi-factor models via Monte Carlo methods. When data were sampled from a true bi-factor structure, each of the approximate fit indices was more likely than not to identify the bi-factor solution as the best fitting. When samples were selected from a true multiple correlated factors structure, approximate fit indices were more likely overall to identify the correlated factors solution as the best fitting. In contrast, when samples were generated from a true higher-order structure, approximate fit indices tended to identify the bi-factor solution as best fitting. There was extensive overlap of fit values across the models regardless of true structure. Although one model may fit a given dataset best relative to the other models, each of the models tended to fit the data well in absolute terms. Given this variability, models must also be judged on substantive and conceptual grounds.
The last five to ten years has seen a renewed interest in the stability of teacher behavior and effectiveness. Data on teacher performance and teacher effectiveness are being used increasingly as the basis for decisions about continued employment, tenure and promotion, and financial bonuses. The purpose of this study is to explore the stability of both teacher performance and effectiveness by determining the extent to which performances and effectiveness of individual teachers fluctuate over time. The sample consisted of 132 teachers for whom both observational and state standardized test data were available for five consecutive years. Neither teacher performance nor effectiveness were highly stable over multiple years of epaa aape Education Policy Analysis Archives Vol. 22 No. 95 2 the study. The observed relationship between teacher performance and teacher effectiveness was reasonably stable over time, but the magnitude of the relationship was quite small. Teacher performance was also likely to be inflated in low performing schools. We also discuss when different observed patterns may be acceptable based on the purpose for which the data are used.La estabilidad del desempeño y eficacia docente: Implicaciones para las políticas de evaluación de maestros Resumen: En los últimos cinco a diez años ha visto un renovado interés en la estabilidad de la conducta y la efectividad de los maestros. Los datos sobre el desempeño de los docentes y la eficacia docente se están utilizando cada vez más como la base para las decisiones sobre el mantenimiento del empleo, la tenencia y la promoción, y los bonos financieros. El propósito de este estudio es explorar la estabilidad de rendimiento y eficacia de los docentes analizando en que medida las actuaciones y la eficacia de los profesores individuales fluctúan con el tiempo. La muestra estuvo constituida estaban disponibles datos tanto observacionales y de pruebas estandarizadas durante cinco años consecutivos. Ni el desempeño docente ni la eficacia era muy estable a lo largo de varios años de estudio. La relación observada entre el desempeño de los docentes y la efectividad del maestro era razonablemente estable en el tiempo, pero la magnitud de la relación era bastante pequeña. Es probable que el desempeño de los docentes también se halla incrementado en escuelas de bajo rendimiento. También discutimos en que medida los diferentes patrones observados pueden ser aceptables como una base confiable para los fines propuestos por esas políticas. Palabras clave: política de evaluación de los docentes; eficacia; desempeño de los docentes; estabilidad del profesorado; valor añadido Estabilidade do desempenho e eficácia do ensino: Implicações para as políticas de avaliação de professores Resumo: Nos últimos cinco a dez anos tem visto um renovado interesse na estabilidade do comportamento e da eficácia dos professores. Os dados sobre o desempenho e eficácia dos professores são cada vez mais usados como base para a toma de decisões sobre emprego retenção, posse e promoção, e incentivos...
Equating and scaling in the context of small sample exams, such as credentialing exams for highly specialized professions, has received increased attention in recent research. Investigators have proposed a variety of both classical and Rasch-based approaches to the problem. This study attempts to extend past research by (1) directly comparing classical and Rasch techniques of equating exam scores when sample sizes are small ( N≤ 100 per exam form) and (2) attempting to pool multiple forms’ worth of data to improve estimation in the Rasch framework. We simulated multiple years of a small-sample exam program by resampling from a larger certification exam program’s real data. Results showed that combining multiple administrations’ worth of data via the Rasch model can lead to more accurate equating compared to classical methods designed to work well in small samples. WINSTEPS-based Rasch methods that used multiple exam forms’ data worked better than Bayesian Markov Chain Monte Carlo methods, as the prior distribution used to estimate the item difficulty parameters biased predicted scores when there were difficulty differences between exam forms.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.