There are a number of cases where local item dependence (LID) may be expected and may be interesting to look at because they might capture meaningful phenomena. We consider two general types of LID, one where the dependence is related to the item order (be it a factual or a conceptual order, as will be explained below), and one where the dependence refers to the gestalt that a set of items can form. We term these order dependency and combination dependency, respectively. For both, we give some examples.
In this article an item response model is introduced for repeated ratings of student work, which we have called the Rater Bundle Model (RBM). Development of this model was motivated by the observation that when repeated ratings occur, the assumption of conditional independence is violated, and hence current state-ofthe-art item response models, such as the rater facets model, that ignore this violation, underestimate measurement error, and overestimate reliability. In the rater bundle model these dependencies are explicitly parameterized. The model is applied to both real and simulated data to illustrate the approach.The main point of this article is to introduce a model for repeated ratings of student work, which we have dubbed the rater bundle model (RBM), and to report on its usefulness in a number of contexts. The motivation for developing this model is that we believe that the current "state-of-the-art" item response models being applied to this situation are not fully expressing our understanding of what is going on when a rater re-rates a piece of student work that has already been rated by another rater. We will call this the "repeated rating" problem. Although this is probably an observation that has been made in the past, the first time it was observed in the literature was by Patz (1996), at least as far as we are aware. Consequent to this observation, several researchers have begun developing alternate models and indices:
In this article we describe the derivation of a taxonomy of personality descriptive nouns. We argue that, contrary to traditional statements, nouns deserve their own special place in the domain of personality language. The ultimate aim is to provide a sound basis for the development of a representative and efective instrument for registering judgements on personality. Study 1 describes the steps that were followed to arrive at a list of personality descriptive nouns. Fourteen subjects took part, with different numbers of subjects at the various stages of selecting the nouns. Seven hundred and fifty‐five nouns resulted from this study. Study 2 (N= 400) describes the determination of the internal structure of the domain of nouns through factor analysis of both self and partner ratings obtained from 200 Dutch‐speaking Belgian subjects and 200 Dutch subjects. By applying a method of rotation to perfectly congruent weights the noun structure turned out to be invariant under self and partner conditions and under the diferent groups of subjects. The results show the existence of a well‐delineated multidimensional noun structure comparable to that of adjectives and of verbs.
In this study, patterns of variation in severities of a group of raters over time or so‐called “rater drift” was examined when raters scored an essay written under examination conditions. At the same time feedback was given to rater leaders (called “table leaders”) who then interpreted the feedback and reported to the raters. Rater severities in five successive periods were estimated using a modified linear logistic test model (LLTM, Fischer, 1973) approach. It was found that the raters did indeed drift towards the mean, but a planned comparision of the feedback with a control condition was not successful; it was believed that this was due to contamination at the table leader level. A series of models was also estimated designed to detect other types of rater effects beyond severity: a tendency to use extreme scores, and tendency to prefer certain categories. The models for these effects were found to be showing significant improvement in fit, implying that these effects were indeed present, although they were difficult to detect in relatively short time periods.
We give an account of classical test theory (CTT) in terms of the more fundamental ideas of item response theory (IRT). This approach views CTT as a very general version of IRT, and the commonly used IRT models as detailed elaborations of CTT for special purposes. We then use this approach to CTT to derive some general results regarding the prediction of the true score of a test from an observed score on that test as well from an observed score on a different test. This leads us to a new view of linking tests that were not developed to be linked to each other. In addition we propose true-score prediction analogues of the Dorans and Holland measures of the population sensitivity of test linking functions. We illustrate the accuracy of the first-order theory using simulated data from the Rasch model and illustrate the effect of population differences using a set of real data.Key words: test theory, true scores, best linear predictors, test linking, nonparallel tests, simulation, Rasch Model ii AcknowledgementsWe would like to thank Neil Dorans, Skip Livingston, and two anonymous referees for many suggestions that have greatly improved this paper. The work reported here is collaborative in every respect and the order of authorship is alphabetical. It was begun while both authors were on the faculty at the
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.