Bayesian network models offer a large degree of flexibility for modeling dependence among observables (item outcome variables) from the same task that may be dependent. This paper explores four design patterns for modeling locally dependent observations from the same task: • No context—Ignore dependence among observables. • Compensatory context—Introduce a latent variable, context, to model task‐specific knowledge and use a compensatory model to combine this with the relevant proficiencies. • Inhibitor context—Introduce a latent variable, context, to model task‐specific knowledge and use a inhibitor (threshold) model to combine this with the relevant proficiencies. • Compensatory cascading—Model each observable as dependent on the previous one in sequence. This paper explores these design patterns through experiments with simulated and real data. When the proficiency variable is categorical, a simple Mantel‐Haenszel procedure can test for local dependence. Although local dependence can cause problems in the calibration, if the models based on these design patterns are successfully calibrated to data, all the design patterns appear to provide very similar inferences about the students. Based on these experiments, the simpler no context design pattern appears to be more stable than the compensatory context model, while not significantly affecting the classification accuracy of the assessment. The cascading design pattern seems to pick up on dependencies missed by the other models and should be explored with further research.
This evaluation study compares the performance of a prototype tool called SourceFinder against the performance of highly trained human test developers. SourceFinder — a specialized search engine developed to locate source material for Graduate Record Examinations (GRE®) reading comprehension passages — employs a variety of shallow linguistic features to model the search criteria employed by expert test developers, to automate the source selection process, and to reduce source‐processing time. The current evaluation provides detailed information about the aspects of source variation that are not well modeled by the current prototype. Approaches for enhancing performance in identified areas are discussed. The present study also provides a more explicit description of the source selection task, and a rich data set for developing a less subjective, more explicit definition of the types of documents preferred by test developers.
This paper describes a Bayesian network model for a candidate assessment design that had four proficiency variables and 48 tasks with 3-12 observable outcome variables per task and scale anchors to identify the location of the subscales. The domain experts' view of the relationship among proficiencies and tasks established a complex prior distribution over 585 parameters. Markov Chain Monte Carlo (MCMC) estimation recovered the parameters of data simulated from the expert model. The sample size and the strength of the prior had only a modest effect on parameter recovery, but did affect the standard error of estimated parameters. Finally, an identifiability issue involving relabeling of proficiency states and permutations of the matrixes is addressed in the context of this study. Scope of StudiesBayesian networks form an attractive class of models for diagnostic assessments as they can relate multiple proficiency variables to multiple dependent observables coming from 1. There were multiple correlated proficiencies, which must be simultaneously estimated. 2. The observable outcome variables were grouped into tasks of varying complexity; each task tapped one or more proficiencies. 3. Tasks had different target difficulties. 4. The tasks were to be presented in complex balanced incomplete block design to ensure sufficient concentration around each of seven potential proficiencies.This paper describes simulation studies designed to answer the question of "Can an assessment with these characteristics be modeled with a Bayesian network?" assuming that there was no local dependence between observables in the same task. This paper describes a number of parameter recovery experiments based on the assessment design described in Section 2. The methodology was roughly similar for all of the experiments. In particular,Key Words and Phrases: Bayesian networks, model construction, Markov Chain Monte Carlo (MCMC) estimation, prior elicitation, scale anchor * Educational Testing Service, MS 13-E Princeton, NJ 08541 USA Mail Address: ralmond@ets.org, almond@acm.org 1) Later versions of this assessment were named iSkills TM . Note that the work described here is based on a very preliminary model of the ICT Literacy which has undergone considerable refinement since the studies described here were carried out and that the models described in this paper do not reflect current thinking about this construct or the scoring methods used in the iSkills assessment.
Bayesian network models offer a large degree of flexibility for modeling dependence among observables (item outcome variables) from the same task, which may be dependent. This article explores four design patterns for modeling locally dependent observations: (a) no context-ignores dependence among observables; (b) compensatory context-introduces a latent variable, context, to model task-specific knowledge and use a compensatory model to combine this with the relevant proficiencies; (c) inhibitor context-introduces a latent variable, context, to model task-specific knowledge and use an inhibitor (threshold) model to combine this with the relevant proficiencies; (d) compensatory cascading-models each observable as dependent on the previous one in sequence. This article explores the four design patterns through experiments with simulated and real data. When the proficiency variable is categorical, a simple Mantel-Haenszel procedure can test for local dependence. Although local dependence can cause problems in the calibration, if the models based on these design patterns are successfully calibrated to data, all the design patterns appear to provide very similar inferences about the Joris Mulder is now with Utrecht University. Sections 3, 4, and 5 of this article are based on the work that Joris Mulder did while a summer intern at ETS. An expanded description of those experiments is available in his final report. We thank the team doing the parallel IRT analysis, including Brad Moulder, Frank Jenkins, Bruce Kaplan, Quncai Zhou, and Youn-Hee Lim, for generously sharing their results with us. Peggy Redman and Irv Katz helped by providing us with lots of the ECD documentation for the tasks used in the ICT literacy assessment. Alexander Matukhin did much of the programming for the StatShop software used in this analysis. Bob Mislevy and David Williamson provided helpful suggestions and cheerful encouragement throughout the project. Dan Eignor and Lou DiBello provided a number of suggestions that were quite helpful in improving the clarity of the presentation. Thanks also to David Rindskopf for finding two anonymous reviewers whose questions and insights helped improve the current article. Downloaded from students. Based on these experiments, the simpler no context design pattern appears more stable than the compensatory context model, while not significantly affecting the classification accuracy of the assessment. The cascading design pattern seems to pick up on dependencies missed by other models and should be explored with further research.
Two automated editing tasks developed in a Phase I study were subjected to item analyses, revised, and then used in a computer-based test administration at a local college. The data collected in the administration were compared with questionnaire data obtained from students to examine the construct validity of the tasks. In a second approach to construct validation, a taxonomy of writing skills was developed and compared to the skills assessed by the editing tasks. Data analyses indicate that total editing score correlates more strongly with self-reported English grades than with self-reported mathematics grades, and that total editing score correlates positively with student self-assessments of their writing skill, recent grades on writing assignments, and college grade point average. A review of the task elements against the taxonomy indicates that the editing tasks assess important writing skills not assessed by freeresponse essays. For this and other reasons, it was concluded that automated editing tasks would serve as a useful complement to free-response writing assessments.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.