2017
DOI: 10.1177/1029864917697782
|View full text |Cite
|
Sign up to set email alerts
|

Investigating adjudicator bias in concert band evaluations: An application of the Many-Facets Rasch Model

Abstract: Prior research indicates mixed findings regarding the consistency of adjudicators' ratings at large ensemble festivals, yet the results of these festivals have strong impacts on the perceived success of instrumental music programs and the perceived effectiveness of their directors. In this study, Rasch modeling was used to investigate the potential influence of adjudicators on performance ratings at a live large ensemble festival. Evaluation forms from a junior high school concert band festival adjudicated by … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

1
6
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
6

Relationship

1
5

Authors

Journals

citations
Cited by 7 publications
(7 citation statements)
references
References 15 publications
1
6
0
Order By: Relevance
“…As a result, the between-subgroup outfit statistics appear to be useful tools for identifying systematic differences in the levels of severity that a rater exercises when assessing various subgroups. In contrast to other popular methods for detecting potential rater biases, such as bias/interaction analyses (e.g., Engelhard, 2008;Goodwin, 2016;Kondo-Brown, 2002;Springer & Bradley, 2018;Wesolowski et al, 2015;Winke et al, 2012), practitioners do not need to make multiple comparisons when interpreting the meaning of rater betweensubgroup outfit statistics. In this article, we have argued that practitioners evaluating performance assessments should consider reporting rater between-subgroup outfit statistics in addition to rater total fit statistics when providing evidence of the fairness of those assessments.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…As a result, the between-subgroup outfit statistics appear to be useful tools for identifying systematic differences in the levels of severity that a rater exercises when assessing various subgroups. In contrast to other popular methods for detecting potential rater biases, such as bias/interaction analyses (e.g., Engelhard, 2008;Goodwin, 2016;Kondo-Brown, 2002;Springer & Bradley, 2018;Wesolowski et al, 2015;Winke et al, 2012), practitioners do not need to make multiple comparisons when interpreting the meaning of rater betweensubgroup outfit statistics. In this article, we have argued that practitioners evaluating performance assessments should consider reporting rater between-subgroup outfit statistics in addition to rater total fit statistics when providing evidence of the fairness of those assessments.…”
Section: Resultsmentioning
confidence: 99%
“…In previous studies of rater-mediated assessments, researchers have proposed numerous indicators of rating quality that reflect various perspectives on what constitutes evidence of high-quality ratings, such as indicators of rater consistency (i.e., agreement or reliability) or fit to a measurement model (e.g., Meadows & Billington, 2005;Myford & Wolfe, 2003, 2004. As part of evaluating the fairness of ratermediated assessments, many researchers have studied differential rater functioning (DRF), or raters' tendency to apply inconsistent levels of severity when they assess students in different subgroups (Engelhard, 2008;Goodwin, 2016;Kondo-Brown, 2002;Springer & Bradley, 2018;Wesolowski, Wind, & Engelhard, 2015;Winke, Gass, & Myford, 2012). When raters exhibit DRF, they may systematically underestimate or overestimate student achievement, depending on students' membership within a subgroup.…”
mentioning
confidence: 99%
“…Simulation 1 aimed to examine the consequences of ignoring dual DRF effects. We created rating data for 200 examinees, five criteria, and three raters; these conditions are quite common in applied assessment research (e.g., Kondo-Brown, 2002; Springer & Bradley, 2018). Data generation followed the same general settings as in Jin and Wang (2018).…”
Section: Simulation 1: Consequences Of Ignoring Dual Drfmentioning
confidence: 99%
“…The Queen Elisabeth Competition has also been studied by several groups of researchers, again mostly with a focus on juror bias due to potential distortions from non-musical factors such as order effects (Flôres and Ginsburgh, 1996;Glejser and Heyndels, 2001). In educational contexts, where it is sometimes easier to obtain complete data from the judges of musical competitions and recitals, there has been considerable research on best practices in rubric and rating-scale design, using the same or similar techniques to those we use here (Latimer et al, 2010;Wesolowski et al, 2016;Springer and Bradley, 2017;Álvarez-Díaz et al, 2020).…”
Section: Introductionmentioning
confidence: 99%