2018
DOI: 10.1111/emip.12211
|View full text |Cite
|
Sign up to set email alerts
|

How Robust Are Cross‐Country Comparisons of PISA Scores to the Scaling Model Used?

Abstract: The Programme for International Student Assessment (PISA) is an important international study of 15‐olds' knowledge and skills. New results are released every 3 years, and have a substantial impact upon education policy. Yet, despite its influence, the methodology underpinning PISA has received significant criticism. Much of this criticism has focused upon the psychometric scaling model used to create the proficiency scores. The aim of this article is to therefore investigate the robustness of cross‐country co… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
23
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 20 publications
(24 citation statements)
references
References 3 publications
1
23
0
Order By: Relevance
“…For PISA 2006, the absolute differences between the country means obtained by the 1PL and 2PL models were relatively small on average and the correlations between the country means from the 1PL and 2PL models were high, even though for a few countries, larger deviations (especially in reading) were observed. Furthermore, for PISA 2015 data, Jerrim et al (2018b) found negligible differences between the relative order of country means for the 1PL model and the 2PL model. For the TIMSS 1995 data set, the 1PL and the 3PL models were compared and the rank order in the country means was found to be very consistent (Brown et al, 2007).…”
Section: Change Of the Scaling Modelmentioning
confidence: 80%
See 1 more Smart Citation
“…For PISA 2006, the absolute differences between the country means obtained by the 1PL and 2PL models were relatively small on average and the correlations between the country means from the 1PL and 2PL models were high, even though for a few countries, larger deviations (especially in reading) were observed. Furthermore, for PISA 2015 data, Jerrim et al (2018b) found negligible differences between the relative order of country means for the 1PL model and the 2PL model. For the TIMSS 1995 data set, the 1PL and the 3PL models were compared and the rank order in the country means was found to be very consistent (Brown et al, 2007).…”
Section: Change Of the Scaling Modelmentioning
confidence: 80%
“…In comparison with the five previous PISA cycles (PISA 2000(PISA , 2003(PISA , 2006(PISA , 2009(PISA , 2012, several substantial changes were implemented in the administration and analysis of PISA 2015 (for an overview of changes, see OECD, 2016, Annex 5). In this article, we focus on two substantial changes (but see Jerrim et al, 2018b, for a broad discussion of other changes). First, instead of a one-parameter logistic (1PL) model (Rasch, 1960), in which only the difficulty parameters for the items are estimated, a two-parameter logistic (2PL) model (Birnbaum, 1968), which estimates an additional discrimination parameter for each item, was used to scale the data.…”
Section: Introductionmentioning
confidence: 99%
“…For South Korea, there are four items with large negative DIF effects (a relative advantage) and no items with large positive DIF effects (a relative disadvantage) that are most strongly down-weighted (see [10]). Hence, it can be concluded the choice of a particular linking method has the potential to impact the ranking of countries in PISA (see also [48,49]).…”
Section: Empirical Example: Pisa 2006 Reading Competencementioning
confidence: 99%
“…Jerrim et al. () examine the impact of the changes in the methods of analysis of PISA responses from 2012 to 2015 that were due to a change in contractors. The four changes were as follows: Allowing some item‐by‐country interactions in item parameter estimation. Shifting from a one‐parameter psychometric model (Rasch) to a two‐parameter item response model, thereby allowing items to differ in discrimination as well as in difficulty. Changing the treatment of “not‐reached” items from incorrect to missing. Using historical data in the estimation of the parameters of certain items. …”
Section: Addressing Robustness (D)mentioning
confidence: 99%
“…They focus on PISA and PIAAC though, mutatis mutandis, the issues apply to all ILSAs. Jerrim et al (2018) examine the impact of the changes in the methods of analysis of PISA responses from 2012 to 2015 that were due to a change in contractors. The four changes were as follows:…”
mentioning
confidence: 99%