Eliminating Differentially Difficult Items as an Approach to Test Bias

Flaugher, Ronald L.; Schrader, William B.

doi:10.1002/j.2333-8504.1978.tb01151.x

Cited by 10 publications

(8 citation statements)

References 4 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Flaugher & Schrader (1978) removed a substantial number of items from the SAT-V and SAT-M that were appreciably more difficult for black than for white students and found that removal of the items had little effect on total score differences between black and white groups. In fact t removing these items made the tests more difficult for both groups.…”

Section: Factor Analysis Methodsmentioning

confidence: 99%

Population Validity and College Entrance Measures

Breland

1978

ETS Research Bulletin Series

View full text Add to dashboard Cite

This report reviews and summarizes the data of published reports and papers pertaining to the validity of measures commonly used at college entrance. These measures consist of various types of quantitative indices used to describe the high school record and various types of academic tests. The data were categorized by sex and ethnic classifications where possible and comparisons were made. Most of the observations made in previous reviews of this kind were upheld. Both the high school record and standardized academic tests appear to be generally useful at college entrance. Both tend to predict college performance, but they are most appropriately used in combination. For Anglo populations, the high school record usually yields slightly higher predictive correlations. For black populations (except black females), however, the data suggest no superiority of the high school record as a predictor of college performance. The amount of data available for Chicano samples was not sufficient for any generalizations about predictive correlations, but the magnitude of those correlations available were on the average lower than those for Anglo populations. For recent years, correlations for black samples tended to be slightly lower than those for Anglo samples. A primary objective in the review was to consider whether inferences made from the high school record and test scores about different populations tended to be correct inferences. Where inferences tended to be systematically incorrect, there was an interest in the direction of the error. Since the high school record and test scores are used in making inferences about probable college performance, the systematic inferential errors of interest consisted of the degree to which college performance of populations were over or underpredicted when the data used in making the inference were not limited to data for that population. Data from a number of studies show that the college performance of black populations, male and female, have been consistently overpredicted by the traditional academic predictors (the high school record and standardized test scores) when predictions are based on data from white or predominantly white samples. The high school record would appear, from the studies reviewed, to be the principal source of this overprediction. Standardized test scores also have tended to overpredict college performance of black populations, but a combination of the high school record and test scores appears to minimize over prediction. In contrast, published data show women to be consistently under predicted by the traditional predictors when predictions are based on data from male or predominantly male samples. This review indicates a number of topics where further research appears to be needed. These topics include the study of: long‐range criteria, objective criteria, prediction within major‐field, the high school record for different populations, college grades for different populations, neglected populations, factorial consistency of measures across populatio...

show abstract

Section: Factor Analysis Methodsmentioning

confidence: 99%

Population Validity and College Entrance Measures

Breland

1978

ETS Research Bulletin Series

View full text Add to dashboard Cite

show abstract

“…Our aim was to create tests that would adhere to operational content and statistical specifications but at the same time exhibit Black/White differences in scores as small and as large as possible under the constraints imposed by the test specifications and the size of our available item pool. To our knowledge this has never been done before, although related research may be found in: Flaugher and Schrader (1978), Green (1972), Ironson and Craig (1982), Kok et al (1985) and Subkoviak et al (1984).…”

Section: Introductionmentioning

confidence: 91%

Test Construction Manipulating Score Differences Between Black and White Examinees: Properties of the Resulting Tests

Hackett

Holland

Pearlman

et al. 1987

ETS Research Report Series

View full text Add to dashboard Cite

Test forms for two item types were developed that maximized or minimized the differences in average scores between Black and White examinees. The test forms were then administered to random samples of examinees along with conventionally constructed control forms. The results suggest that impact manipulation is possible while maintaining content specifications and average difficulty level. However, the distribution of item difficulties was altered as were the reliability of the experimental sections and the relationships of these sections to other parts of the test.

show abstract

“…and Validity There are a number of relationships between impact, item difficulty, test reliability and validity that must be considered when we interpret the results of the approaches used in this paper. The effect of these relationships has also constrained the results of previous studies such as Hackett, Holland, Pearlman, & Thayer (1987) and Flaugher & Schrader (1978) but was not explicitly recognized.…”

Section: Impactmentioning

confidence: 97%

“…Some authors assume any differences are irrelevant (e. g., Rosser, 1987;Weiss, 1987) and argue that the fairest test is assembled by choosing items that minimize group mean score differences, regardless of the effect on the construct intended to be measured by the test. A variation on this theme is the argument based on the work of some researchers (e. g., Carlton and Harris, 1992;Gallagher, 1992;Schmitt and Crone, 1991) that groups differ on average on classes of items, and that fairer tests may be produced by changing the construct being measured and eliminating just those classes of items (e. g., Flaugher and Schrader, 1978). Still other authors argue that the only irrelevant differences are those remaining after conditioning on test score, which has lead to the study of Differential Item Functioning (DIF; Holland and Thayer, 1988).…”

Section: Introductionmentioning

confidence: 99%

An Investigation of the Simultaneous Moderation of Average Gender and African‐american Score Differences on a Test of Mathematical Reasoning

Stocking

Jirele

Lewis

et al. 1998

ETS Research Report Series

View full text Add to dashboard Cite

A pool of items from operational tests of mathematical reasoning was constructed to investigate the feasibility of using automated test assembly methods to simultaneously moderate possibly irrelevant differences between the performance of women and men and African-American and White test takers. None of the artificial tests investigated exhibited substantial impact moderation, although the estimated mean scaled score differences for the relevant population indicated a modest move in the intended direction: the difference between scaled score means was reduced by about 20% for women and men and about 9%for African-American and White test takers. Although many issues in the implementation of this methodology remain to be solved, the consideration of impact in automated test assembly along with the maintenance of the detailed test plan appears to be a potential method of moderating possibly irrelevant mean test score differences.

show abstract

Eliminating Differentially Difficult Items as an Approach to Test Bias

Cited by 10 publications

References 4 publications

Population Validity and College Entrance Measures

Population Validity and College Entrance Measures

Test Construction Manipulating Score Differences Between Black and White Examinees: Properties of the Resulting Tests

An Investigation of the Simultaneous Moderation of Average Gender and African‐american Score Differences on a Test of Mathematical Reasoning

Contact Info

Product

Resources

About