Ensuring the Fairness of Gre Writing Prompts: Assessing Differential Difficulty

Broer, Markus

doi:10.1002/j.2333-8504.2005.tb01988.x

Cited by 13 publications

(18 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Where test-taker gender is concerned, Breland, Bridgeman, and Fowles (1999), Breland, Lee, Najarian, and Muraki (2004), and Broer, Lee, Rizavi, and Powers (2005) have found instances of differential item functioning (DIF) in favor of female test takers in six different performance writing tests, to a magnitude up to 0.2 of a standard deviation. The authors caution though that the direction and size of the differences are highly sensitive to sample selection, and the findings should not be generalized beyond the exams studied.…”

Section: Test-taker Characteristicsmentioning

confidence: 99%

Writing in the Foreign Language Classroom: The Effects of Prompts on the Iranian Learners of English

Bahrebar¹,

Darabad²

2013

IJELE

View full text Add to dashboard Cite

The present study intended to inspect whether different prompts (bare, vocabulary, and prose model) could exert different effects on the overall quality of the Iranian intermediate EFL learners' writing performances or not, and in case of effect, which of the prompts was efficacious toward the enhancement of the participants' overall writing quality. Forty male and female Iranian intermediate students learning English as a foreign language were selected via administering the Preliminary English Test (PET) and were each given three different writing prompts (bare, vocabulary, and prose model) during three consecutive weeks to write. The writing tasks involved the descriptive mode of discourse. Each task was presented in the context of a reply to an imaginary pen pal from England, Jack. A close examination of the results manifested the fact that the prose model elicited the best overall writing quality in the descriptive discourse mode when evaluated holistically. The bare prompt typically resulted in the poorest writing. The vocabulary prompt elicited writing samples that usually were of higher overall quality than those obtained with a bare prompt, but of lower quality than samples obtained with a prose model prompt.

show abstract

Section: Test-taker Characteristicsmentioning

confidence: 99%

Writing in the Foreign Language Classroom: The Effects of Prompts on the Iranian Learners of English

Bahrebar¹,

Darabad²

2013

IJELE

View full text Add to dashboard Cite

show abstract

“…In psychometric studies of tests with real data it is frequent to find items that display UDIF, although NUDIF items can also be found (Broer et al 2005;Ferreres et al 2000Ferreres et al , 2002Gierl et al 1999;Hambleton and Rogers 1989;Hauser and Huang 1996;Padilla et al 1998;Prieto et al 1999). Among the methods for the detection of NUDIF, the modified Mantel-Haenszel procedure (Mazor et al 1994), the logistic regression (Swaminathan and Rogers 1990), the Crossing SIBTEST (Li and Stout 1996) and the log-lineal models (Mellenbergh 1982) stand out.…”

mentioning

confidence: 93%

Erroneous detection of nonuniform DIF using the Breslow-Day test in a short test

et al. 2007

View full text Add to dashboard Cite

show abstract

“…One method of examining the effectiveness of sensitivity reviews has been to analyze the extent to which reviewers' item evaluations coincide with the results of item bias analyses. A number of studies report that test reviewers perform no better than chance when asked to identify a priori which test items will demonstrate statistical bias (e.g., Broer, Lee, Rizavi, & Powers, ; Engelhard, Hansche, & Rutledge, ; Plake, ; Sandoval & Miille, ; Young, ) or survey items that will be nonequivalent across languages (Carter et al, ). Our examination of 15 books on the subject of assessment suggested that some writers use this evidence as a basis for stating that although qualitative test reviews are sometimes done, they are not necessarily useful practices, as individuals have not proven effective at identifying biased items.…”

Section: Introductionmentioning

confidence: 99%

An Examination of Common Sensitivity Review Practices in Test Development

Golubovich

Grand

Ryan

et al. 2014

Int J Selection Assessment

View full text Add to dashboard Cite

Sensitivity reviews of test content are commonly advocated techniques for reducing bias and enhancing fairness in employment and educational testing. However, few descriptions or empirical investigations of these techniques exist. The present paper presents a study documenting common sensitivity review practices and the extent to which expert reviewers agree in their judgments of item sensitivity. Results indicated that reviewers do not always receive training or adequate guidance and most frequently encounter subtle forms of insensitive item content. Further, only modest agreement in expert ratings of item sensitivity was found. Implications for improving sensitivity review practices are presented.

show abstract

Ensuring the Fairness of Gre Writing Prompts: Assessing Differential Difficulty

Abstract: Abstractfound that were large enough to warrant the removal of prompts from the item pool. Several potential causes of high DIF values for some prompts are discussed with respect to the content characteristics of these prompts.

Cited by 13 publications

References 26 publications

Writing in the Foreign Language Classroom: The Effects of Prompts on the Iranian Learners of English

Writing in the Foreign Language Classroom: The Effects of Prompts on the Iranian Learners of English

Erroneous detection of nonuniform DIF using the Breslow-Day test in a short test

An Examination of Common Sensitivity Review Practices in Test Development

Contact Info

Product

Resources

About