2021
DOI: 10.1007/s41237-021-00144-w
|View full text |Cite
|
Sign up to set email alerts
|

A multidimensional generalized many-facet Rasch model for rubric-based performance assessment

Abstract: Performance assessment, in which human raters assess examinee performance in a practical task, often involves the use of a scoring rubric consisting of multiple evaluation items to increase the objectivity of evaluation. However, even when using a rubric, assigned scores are known to depend on characteristics of the rubric’s evaluation items and the raters, thus decreasing ability measurement accuracy. To resolve this problem, item response theory (IRT) models that can estimate examinee ability while consideri… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6

Relationship

2
4

Authors

Journals

citations
Cited by 8 publications
(7 citation statements)
references
References 57 publications
0
7
0
Order By: Relevance
“…The MFRM is the most common type of model used for IRT with rater parameters (Linacre, 1989 ). Furthermore, there are various alternative models such as a two-parameter logistic model with rater severity parameters (Patz & Junker, 1999 ), generalized partial credit models incorporating various rater parameters (Uto, 2021b ; Uto & Ueno, 2020 ), hierarchical rater models (DeCarlo, Kim, & Johnson, 2011 ; Patz, Junker, Johnson, & Mariano, 2002 ; Qiu, Chiu, Wang, & Chen, 2022 ), extensions based on signal detection models (DeCarlo, 2005 ; Soo Park & Xing, 2019 ), rater bundle models (Wilson & Hoskens, 2001 ), and trifactor models (Shin et al, 2019 ). However, this study focuses on the MFRM because it is the most widely used and well-established of these models.…”
Section: Many-facet Rasch Models For Rater Severity Driftmentioning
confidence: 99%
See 2 more Smart Citations
“…The MFRM is the most common type of model used for IRT with rater parameters (Linacre, 1989 ). Furthermore, there are various alternative models such as a two-parameter logistic model with rater severity parameters (Patz & Junker, 1999 ), generalized partial credit models incorporating various rater parameters (Uto, 2021b ; Uto & Ueno, 2020 ), hierarchical rater models (DeCarlo, Kim, & Johnson, 2011 ; Patz, Junker, Johnson, & Mariano, 2002 ; Qiu, Chiu, Wang, & Chen, 2022 ), extensions based on signal detection models (DeCarlo, 2005 ; Soo Park & Xing, 2019 ), rater bundle models (Wilson & Hoskens, 2001 ), and trifactor models (Shin et al, 2019 ). However, this study focuses on the MFRM because it is the most widely used and well-established of these models.…”
Section: Many-facet Rasch Models For Rater Severity Driftmentioning
confidence: 99%
“…For complex models, however, it is not generally feasible to derive or calculate the marginal posterior distribution due to there being high-dimensional multiple integrals. MCMC, a random sampling-based estimation method, has been widely used in various fields to address this problem, including in IRT studies (Brooks, Gelman, Jones, & Meng, 2011 ; Fontanella et al, 2019 ; Fox, 2010 ; Uto, 2021b ; Uto & Ueno, 2020 ; van Lier et al, 2018 ; Zhang, Xie, You, & Huang, 2011 ).…”
Section: Proposed Modelmentioning
confidence: 99%
See 1 more Smart Citation
“…The unbiased essay scores θ j in the model can be estimated from observed essay rating data U while considering rater bias effects in a manner similar to that of the traditional GPCM, which can estimate examinee abilities while considering the effects of item characteristics. IRT models with rater parameters, including GMFRM, have been widely used for various performance tests, including essay writing tests and speaking tests, not only to realize an accurate ability or score estimation but also to analyze effects of various bias factors such as rater bias (e.g., [8]- [13], [35], [41]- [43], [58]).…”
Section: B Irt Models With Rater Parametersmentioning
confidence: 99%
“…In short, one cannot simply average the scores and call the output knowledge or learning. While some attempts have been made in the psychometric literature to measure knowledge with rubrics (e.g., Uto, 2021), to our knowledge, none have explicitly modeled the censoring problem. 1 In this paper, we show how to model and estimate this problem in a robust, understandable, way.…”
Section: Introductionmentioning
confidence: 99%