2019
DOI: 10.1177/0265532219879654
|View full text |Cite
|
Sign up to set email alerts
|

Evaluating subscore uses across multiple levels: A case of reading and listening subscores for young EFL learners

Abstract: Stakeholders of language tests are often interested in subscores. However, reporting a subscore is not always justified; a subscore should provide reliable and distinct information to be worth reporting. When a subscore is used for decisions across multiple levels (e.g., individual test takers and schools), it needs to be justified for its reliability and distinctiveness at every relevant level. In this study, we examined whether reporting seven Reading and Listening subscores of the TOEFL Primary® test, a sta… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
9
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
6

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(9 citation statements)
references
References 50 publications
(91 reference statements)
0
9
0
Order By: Relevance
“…For instance, TOEFL iBT listening section was composed of 34 listening items measuring three major listening subskills (Lee & Sawaki, 2009) and shortened to 28 listening items in the Shorter TOEFL iBT® Test starting from August 1, 2019. The TOEFL Primary listening section consists of 30 items that assess four communication goals (Choi & Papageorgiou, 2020). It is therefore understandable that previous research on providing subscores has generally produced unsatisfactory results, claiming that subscores are not of adequate quality psychometrically (Papageorgiou & Choi, 2018), and that subscore-based inferences are supported only at group level but not at individual test taker level (Choi & Papageorgiou, 2020).…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…For instance, TOEFL iBT listening section was composed of 34 listening items measuring three major listening subskills (Lee & Sawaki, 2009) and shortened to 28 listening items in the Shorter TOEFL iBT® Test starting from August 1, 2019. The TOEFL Primary listening section consists of 30 items that assess four communication goals (Choi & Papageorgiou, 2020). It is therefore understandable that previous research on providing subscores has generally produced unsatisfactory results, claiming that subscores are not of adequate quality psychometrically (Papageorgiou & Choi, 2018), and that subscore-based inferences are supported only at group level but not at individual test taker level (Choi & Papageorgiou, 2020).…”
Section: Discussionmentioning
confidence: 99%
“…The TOEFL Primary listening section consists of 30 items that assess four communication goals (Choi & Papageorgiou, 2020). It is therefore understandable that previous research on providing subscores has generally produced unsatisfactory results, claiming that subscores are not of adequate quality psychometrically (Papageorgiou & Choi, 2018), and that subscore-based inferences are supported only at group level but not at individual test taker level (Choi & Papageorgiou, 2020). This study has demonstrated that CDA analyses, by using a smaller number of latent scale points in estimation than item response theory (IRT) analyses (Templin & Bradshaw, 2013), provide acceptable classification reliability and thus the possibility to meet the substantial demand from test users on more detailed information.…”
Section: Discussionmentioning
confidence: 99%
“…The second implication relates to current scoring practice in EAP assessment. Even if the EAP assessment or language assessment in general measures a multidimensional construct, current scoring practice in operational contexts is confined to unidimensional IRT models, primarily owing to the challenge of interpreting item parameters in MIRT models (Reise et al, 2014) and the complexity of communicating effectively with test stakeholders (Choi & Papageorgiou, 2020). Our study suggested that despite the presence of multidimensionality, fitting a unidimensional IRT model would not bias item parameter estimates for most grade clusters.…”
Section: Discussionmentioning
confidence: 99%
“…Listening tests often consist of multiple components targeting different communication goals (Choi and Papageorgiou, 2020). Scores on each component of the listening test, also called listening subscores, may provide added value over the total score.…”
Section: Grading and Awardingmentioning
confidence: 99%
“…Scores on each component of the listening test, also called listening subscores, may provide added value over the total score. To examine the justifiability of reporting subscores at the individual and school levels, Choi and Papageorgiou (2020) explored the reliability and distinctiveness of listening and reading subscores of the TOEFL Primary test. Four listening subscores based on different communication goals were targeted, that is, Monologue, Dialogue, Narrative, and Academic subscores.…”
Section: Grading and Awardingmentioning
confidence: 99%