2005
DOI: 10.1002/j.2333-8504.2005.tb01988.x
|View full text |Cite
|
Sign up to set email alerts
|

Ensuring the Fairness of Gre Writing Prompts: Assessing Differential Difficulty

Abstract: Abstractfound that were large enough to warrant the removal of prompts from the item pool. Several potential causes of high DIF values for some prompts are discussed with respect to the content characteristics of these prompts.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
18
0

Year Published

2007
2007
2023
2023

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 13 publications
(18 citation statements)
references
References 26 publications
0
18
0
Order By: Relevance
“…Where test-taker gender is concerned, Breland, Bridgeman, and Fowles (1999), Breland, Lee, Najarian, and Muraki (2004), and Broer, Lee, Rizavi, and Powers (2005) have found instances of differential item functioning (DIF) in favor of female test takers in six different performance writing tests, to a magnitude up to 0.2 of a standard deviation. The authors caution though that the direction and size of the differences are highly sensitive to sample selection, and the findings should not be generalized beyond the exams studied.…”
Section: Test-taker Characteristicsmentioning
confidence: 99%
“…Where test-taker gender is concerned, Breland, Bridgeman, and Fowles (1999), Breland, Lee, Najarian, and Muraki (2004), and Broer, Lee, Rizavi, and Powers (2005) have found instances of differential item functioning (DIF) in favor of female test takers in six different performance writing tests, to a magnitude up to 0.2 of a standard deviation. The authors caution though that the direction and size of the differences are highly sensitive to sample selection, and the findings should not be generalized beyond the exams studied.…”
Section: Test-taker Characteristicsmentioning
confidence: 99%
“…In psychometric studies of tests with real data it is frequent to find items that display UDIF, although NUDIF items can also be found (Broer et al 2005;Ferreres et al 2000Ferreres et al , 2002Gierl et al 1999;Hambleton and Rogers 1989;Hauser and Huang 1996;Padilla et al 1998;Prieto et al 1999). Among the methods for the detection of NUDIF, the modified Mantel-Haenszel procedure (Mazor et al 1994), the logistic regression (Swaminathan and Rogers 1990), the Crossing SIBTEST (Li and Stout 1996) and the log-lineal models (Mellenbergh 1982) stand out.…”
mentioning
confidence: 93%
“…One method of examining the effectiveness of sensitivity reviews has been to analyze the extent to which reviewers' item evaluations coincide with the results of item bias analyses. A number of studies report that test reviewers perform no better than chance when asked to identify a priori which test items will demonstrate statistical bias (e.g., Broer, Lee, Rizavi, & Powers, ; Engelhard, Hansche, & Rutledge, ; Plake, ; Sandoval & Miille, ; Young, ) or survey items that will be nonequivalent across languages (Carter et al, ). Our examination of 15 books on the subject of assessment suggested that some writers use this evidence as a basis for stating that although qualitative test reviews are sometimes done, they are not necessarily useful practices, as individuals have not proven effective at identifying biased items.…”
Section: Introductionmentioning
confidence: 99%