2001
DOI: 10.1002/j.2333-8504.2001.tb01851.x
|View full text |Cite
|
Sign up to set email alerts
|

Using a New Statistical Model for Testlets to Score Toefl

Abstract: Standard item response theory (IRT) models fit to examination responses ignore the fact that sets of items (testlets) often are matched with a single common stimuli (e.g., a reading comprehension passage). In this setting, all items given to an examinee are unlikely to be conditionally independent (given examinee proficiency). Models that assume conditional independence will overestimate the precision with which examinee proficiency is measured. Overstatement of precision may lead to inaccurate inferences as w… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

4
58
0

Year Published

2007
2007
2021
2021

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 41 publications
(62 citation statements)
references
References 25 publications
4
58
0
Order By: Relevance
“…The reliability of the test was overestimated when the standard Rasch model was employed. One should note that although the parameter estimates based on the two models are very similar, the information function based on the standard Rasch model could be imprecise since local dependence ensuing from shared item formats is ignored (Ip, 2010;Wainer & Wang, 2000). In general, ignoring LID due to methods can act like LID due to common stimuli and lead to inaccurate estimation of reliability and standard errors of the primary ability dimension (Bradlow, Wainer, & Wang, 1999;Marais & Andrich, 2008).…”
Section: Discussionmentioning
confidence: 95%
See 2 more Smart Citations
“…The reliability of the test was overestimated when the standard Rasch model was employed. One should note that although the parameter estimates based on the two models are very similar, the information function based on the standard Rasch model could be imprecise since local dependence ensuing from shared item formats is ignored (Ip, 2010;Wainer & Wang, 2000). In general, ignoring LID due to methods can act like LID due to common stimuli and lead to inaccurate estimation of reliability and standard errors of the primary ability dimension (Bradlow, Wainer, & Wang, 1999;Marais & Andrich, 2008).…”
Section: Discussionmentioning
confidence: 95%
“…Most empirical examinations of this issue, using testlet model have focused on local dependence due to common stimuli. For example, Wainer and Wang (2000) analyzed the reading and listening comprehension sections of the Test of English as a Foreign Language with the 3-PL TRT model. They compared the difficulty, discrimination, and guessing parameters estimated by the TRT model with those obtained from a standard IRT model.…”
Section: Previous Applications Of Testlet Model In Educational Testingmentioning
confidence: 99%
See 1 more Smart Citation
“…Since the constituent measurement model for the rule-space methodology is an IRT model that produces estimated ability values, models that include parameters for testlet dependencies (e.g., Wainer & Wang, 2001;Wang, Bradlow, & Wainer, 2002) could be used. In order to model multiple strategies or stages of development, models are also available (e.g., Wilson, 1989) even though the ability estimates from the different classes would have to be comparable in meaning for the rule-space analysis to be interpretable also.…”
Section: Rule-space Analysismentioning
confidence: 99%
“…The use of unidimensional IRT models in conditions incompatible with the local independence assumption leads to lower estimations owing to the requirement of standard error values in individuals' ability parameter estimations (Wainer & Wang, 2001). In addition, the traits measured in the fields of education and psychology are generally so complicated that they cannot be grouped under a single dimension.…”
Section: Introductionmentioning
confidence: 99%