Sebastian Gary scite author profile

Gary

2019

PLoS ONE

Continuous norming methods have seldom been subjected to scientific review. In this simulation study, we compared parametric with semi-parametric continuous norming methods in psychometric tests by constructing a fictitious population model within which a latent ability increases with age across seven age groups. We drew samples of different sizes (n = 50, 75, 100, 150, 250, 500 and 1,000 per age group) and simulated the results of an easy, medium, and difficult test scale based on Item Response Theory (IRT). We subjected the resulting data to different continuous norming methods and compared the data fit under the different test conditions with a representative cross-validation dataset of n = 10,000 per age group. The most significant differences were found in suboptimal (i.e., too easy or too difficult) test scales and in ability levels that were far from the population mean. We discuss the results with regard to the selection of the appropriate modeling techniques in psychometric test construction, the required sample sizes, and the requirement to report appropriate quantitative and qualitative test quality criteria for continuous norming methods in test manuals.

Modelling Norm Scores with the cNORM Package in R

Gary

Lenhard³

2021

Psych

In this article, we explain and demonstrate how to model norm scores with the cNORM package in R. This package is designed specifically to determine norm scores when the latent ability to be measured covaries with age or other explanatory variables such as grade level. The mathematical method used in this package draws on polynomial regression to model a three-dimensional hyperplane that smoothly and continuously captures the relation between raw scores, norm scores and the explanatory variable. By doing so, it overcomes the typical problems of classical norming methods, such as overly large age intervals, missing norm scores, large amounts of sampling error in the subsamples or huge requirements with regard to the sample size. After a brief introduction to the mathematics of the model, we describe the individual methods of the package. We close the article with a practical example using data from a real reading comprehension test.

In norming we trust

Gary

2021

Diagnostica

Zusammenfassung. Ziel der Untersuchung war ein systematischer Vergleich verschiedener Verfahren zur Normdatenmodellierung. Der auf Taylor-Polynomen basierende semi-parametrische Normierungsansatz (SPCN) mittels cNORM ( Lenhard, Lenhard & Gary, 2018 ) wurde parametrischen Anpassungen basierend auf Generalized Additive Models for Location, Scale and Shape (GAMLSS; Stasinopoulos et al., 2018 ) gegenübergestellt und die Normierungsgüte in Abhängigkeit der Faktoren Normstichprobengröße ( n = 525, 700, 1 050, 1 750), Itemanzahl (i = 10, 20, 40) sowie Itemschwierigkeit analysiert. Die Modellierung erfolgte kreuzvalidiert auf der Basis simulierter Rohdaten von Normierungs- und Validierungsstichproben: Mittels der verschiedenen Verfahren wurden auf der Basis der Normierungsstichprobe statistische Modelle berechnet und auf die Validierungsstichprobe übertragen, um die jeweils vorhergesagten mit den tatsächlichen Normwerten zu vergleichen. Der semi-parametrische Ansatz lieferte in den meisten Fällen den geringsten Normierungsfehler und damit das beste Normierungsergebnis. Die deutlichsten Unterschiede fanden sich bei leichten bzw. schweren Testskalen in Verbindung mit einer kleinen Itemanzahl. Der Einfluss der Normstichprobengröße war bei allen Methoden vergleichbar.

Reducing the Bias of Norm Scores in Non-Representative Samples: Weighting as an Adjunct to Continuous Norming Methods

Gary¹,

Lenhard²,

et al. 2023

Assessment

We investigated whether the accuracy of normed test scores derived from non-demographically representative samples can be improved by combining continuous norming methods with compensatory weighting of test results. To this end, we introduce Raking, a method from social sciences, to psychometrics. In a simulated reference population, we modeled a latent cognitive ability with a typical developmental gradient, along with three demographic variables that were correlated to varying degrees with the latent ability. We simulated five additional populations representing patterns of non-representativeness that might be encountered in the real world. We subsequently drew smaller normative samples from each population and used an one-parameter logistic Item Response Theory (IRT) model to generate simulated test results for each individual. Using these simulated data, we applied norming techniques, both with and without compensatory weighting. Weighting reduced the bias of the norm scores when the degree of non-representativeness was moderate, with only a small risk of generating new biases.