A Statistical Test for Differential Item Pair Functioning

Bechger, Timo M.; Maris, Gunter

doi:10.1007/s11336-014-9408-y

Cited by 52 publications

(61 citation statements)

References 53 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In contrast to our study, Crins et al did not find DIF for any of the items flagged for DIF in our study that were also included in the PROMIS Physical Function v1.2 item bank [49]. It has been suggested that such differences can occur because most available DIF methods can detect whether there is DIF but cannot identify the exact DIF items due to parameter identification issues [56]. Our study and the study of Crins et al, found minimal impact of language DIF on T-scores, which suggests that the original US item parameters can be used for calculating the T-scores of the DF-PROMIS-UE v2.0 bank.…”

Section: Discussioncontrasting

confidence: 99%

Graded response model fit, measurement invariance and (comparative) precision of the Dutch-Flemish PROMIS® Upper Extremity V2.0 item bank in patients with upper extremity disorders

Lameijer

Bruggen

Haan

et al. 2020

BMC Musculoskelet Disord

View full text Add to dashboard Cite

Background: The Dutch-Flemish PROMIS® Upper Extremity (DF-PROMIS-UE) V2.0 item bank was recently developed using Item Response Theory (IRT). Unknown for this bank are: (1) if it is legitimate to calculate IRT-based scores for short forms and Computerized Adaptive Tests (CATs), which requires that the items meet the assumptions of and fit the IRT-model (Graded Response Model [GRM]);(2) if it is legitimate to compare (sub) groups of patients using this measure, which requires measurement invariance; and (3) the precision of the estimated patients' scores for patients with different levels of functioning and compared to legacy measures. Aims were to evaluate (1) the assumptions of and fit to the GRM, (2) measurement invariance and (3) (comparative) precision of the DF-PROMIS-UE v2.0. Methods: Cross-sectional data were collected in Dutch patients with upper extremity disorders. Assessed were IRT-assumptions (unidimensionality [bi-factor analysis], local independence [residual correlations], monotonicity [coefficient H]), GRM item fit, measurement invariance (absence of Differential Item Functioning [DIF] due to age, gender, center, duration, and location of complaints) and precision (standard error of IRT-based scores across levels of functioning). To study measurement invariance for language [Dutch vs. English], additional US data were used. Legacy instruments were the Disability of the Arm, Shoulder and Hand (DASH), the QuickDASH and the Michigan Hand Questionnaire (MHQ).

show abstract

Section: Discussioncontrasting

confidence: 99%

Graded response model fit, measurement invariance and (comparative) precision of the Dutch-Flemish PROMIS® Upper Extremity V2.0 item bank in patients with upper extremity disorders

Lameijer

Bruggen

Haan

et al. 2020

BMC Musculoskelet Disord

View full text Add to dashboard Cite

show abstract

“…Although the concept of DIF seems straightforward, some problems have been highlighted in among others a recent study by Bechger and Maris (2014) and are mostly related to comparing parameters that are not identified from the observations. Bechger and Maris (2014) proposed using a differential item pair functioning DIF test, which focuses on comparing item pairs instead of seeing DIF as an item property.…”

Section: Differential Item Functioningmentioning

confidence: 99%

“…The procedure starts with a separate calibration of the data within each group. There exists an overall test for DIF, which under the null hypothesis that there is no DIF follows a Chi-square distribution with the number of items minus one degrees of freedom (Bechger and Maris 2014). If an item pair in the calibration of one group has a different relative difficulty when compared to the relative difficulty in the calibration of the second group, that item pair is subject to DIF.…”

Section: Differential Item Functioningmentioning

confidence: 99%

See 1 more Smart Citation

Differential Item Functioning in PISA Due to Mode Effects

Feskens

Fox

Zwitser

2019

Theoretical and Practical Advances in Computer-Based Educational Measurement

View full text Add to dashboard Cite

One of the most important goals of the Programme for International Student Assessment (PISA) is assessing national changes in educational performance over time. These so-called trend results inform policy makers about the development of ability of 15-year-old students within a specific country. The validity of those trend results prescribes invariant test conditions. In the 2015 PISA survey, several alterations to the test administration were implemented, including a switch from paper-based assessments to computer-based assessments for most countries (OECD 2016a). This alteration of the assessment mode is examined by evaluating if the items used to assess trends are subject to differential item functioning across PISA surveys (2012 vs. 2015). Furthermore, the impact on the trend results due to the change in assessment mode of the Netherlands is assessed. The results show that the decrease reported for mathematics in the Netherlands is smaller when results are based upon a separate national calibration.

show abstract

“…For practical analysis, therefore, further assumptions need to be made. One that is often used in practice is to expect the majority of items to be DIF‐free (e.g., Angoff, ; Bechger & Maris, ; Koretz & McCaffrey, ; Pohl, Stets, & Carstensen, ). This approach follows the logic that the majority of test items work as intended and only a few items show DIF.…”

mentioning

confidence: 99%

A Comparison of Aggregation Rules for Selecting Anchor Items in Multigroup DIF Analysis

Huelmann

Debelak

Strobl

2019

J Educational Measurement

View full text Add to dashboard Cite

This study addresses the topic of how anchoring methods for differential item functioning (DIF) analysis can be used in multigroup scenarios. The direct approach would be to combine anchoring methods developed for two‐group scenarios with multigroup DIF‐detection methods. Alternatively, multiple tests could be carried out. The results of these tests need to be aggregated to determine the anchor for the final DIF analysis. In this study, the direct approach and three aggregation rules are investigated. All approaches are combined with a variety of anchoring methods, such as the “all‐other purified” and “mean p‐value threshold” methods, in two simulation studies based on the Rasch model. Our results indicate that the direct approach generally does not lead to more accurate or even to inferior results than the aggregation rules. The min rule overall shows the best trade‐off between low false alarm rate and medium to high hit rate. However, it might be too sensitive when the number of groups is large. In this case, the all rule may be a good compromise. We also take a closer look at the anchor selection method “next candidate,” which performed rather poorly, and suggest possible improvements.

show abstract

A Statistical Test for Differential Item Pair Functioning

Cited by 52 publications

References 53 publications

Graded response model fit, measurement invariance and (comparative) precision of the Dutch-Flemish PROMIS® Upper Extremity V2.0 item bank in patients with upper extremity disorders

Graded response model fit, measurement invariance and (comparative) precision of the Dutch-Flemish PROMIS® Upper Extremity V2.0 item bank in patients with upper extremity disorders

Differential Item Functioning in PISA Due to Mode Effects

A Comparison of Aggregation Rules for Selecting Anchor Items in Multigroup DIF Analysis

Contact Info

Product

Resources

About