Martin J. Bergee scite author profile

2003

Assessment of music performance in authentic contexts remains an underinvestigated area of research. This study is an examination of one such context, the inter-judge reliability of faculty evaluation of end-of-semester applied music performances. Brass (n = 4), percussion (n = 2), woodwind (n = 5), voice (n = 5), piano (n = 3), and string (n = 5) instructors evaluating a recent semester's applied music juries at a large university participated in the study. Each evaluator completed a criterion-specific rating scale for each performer and assigned each performance a global letter grade not shared with other evaluators or with the performer. Interjudge reliability was determined for each group's rating scale total scores, subscale scores, and the letter-grade assessment. All possible permutations of two, three, and four were examined for interjudge reliability, and averaged correlations, standard deviations, and ranges were determined. Full-panel interjudge reliability was consistently good regardless of panel size. All total score reliability coefficients were statistically significant, as were all coefficients for the global letter-grade assessment. All subscale reliabilities for all groups except Percussion (which, with an n of 2, had a stringent significance criterion) were statistically significant, with the exception of the Suitability subscale in Voice. For larger panels (ns of 4 and 5), rating scale total score reliability was consistently but not greatly higher than reliability for the letter-grade assessment. There was no decrease of average reliability as group size incrementally decreased. Permutations of two and three evaluators, however, tended on average to exhibit more variability, greater range, and less uniformity than did groups of four and five. No differences in reliability were noted among levels of experience or between teaching assistants and faculty members. Use of a minimum of five adjudicators for performance evaluation in this context was recommended.

Relationships among Faculty, Peer, and Self-Evaluations of Applied Performances

1997

This study is the second in a series examining relationships among faculty, peer, and self-evaluations of applied music end-of-semester performances. At three locations, college and university voice, percussion, woodwind, brass, and stringed instrument instructors rated undergraduate performances. Later, the performers rated the same set of performances (one of which was their own) on videotape. Ranging from .23 to .93, total score faculty interjudge reliability was mixed. Total score interjudge reliability among student (peer) panels was more consistent (.83-. 89). Most category score reliabilities were acceptable, although there was a wide range. Consistent with results of the first investigation, correlations between faculty and peer evaluations generally were high, ranging from .61 (p < .10) to .98 (p < .01). Also consistent with results of the first investigation, self-evaluation correlated poorly with both faculty and peer evaluation. No significant differences in self-evaluation were found among performance concentrations (voice, percussion, etc.) or between preliminary-level (first or second year) and upper-level (third year and beyond) performance status.

A Comparison of Faculty, Peer, and Self-Evaluation of Applied Brass Jury Performances

1993

Authorities agree that peer evaluation and self-evaluation can help improve teaching performance. Evaluation of applied music skills, however, remains heavily teacher-centered. In this investigation, I explored the efficacy of peer and self-evaluation of applied brass jury performances. In three episodes at two locations, university faculty members evaluated live brass jury performances using an author-constructed Brass Performance Rating Scale (BPRS). Also using the BPRS, students rated these same performances (one of which was theirs) on videotape. To control for adjudicators' prior knowledge of performers, a fourth panel of adjudicators unfamiliar with the performers evaluated one episodes performances. Interjudge reliability for faculty and peer evaluation panels generally was high, with total score correlations ranging from .83 to .89 ( p < .01). Correlations among faculty and peer-group evaluations also were high, with total score r ranging from .86 to .91 ( p < .01). Consonant with prior investigations, self-evaluation generally correlated poorly with faculty and peer evaluation. The effect of videotape seemed minimal; scoring discrepancies between live and videotaped performances were low. In this investigation, prior knowledge of performers did not seem to affect evaluations.

Influence of Selected Variables on Solo and Small-Ensemble Festival Ratings

Platt

2003

With this study, we examined four potential influences on American high school solo and small-ensenble festival adjudicator ratings-time of day, performing medium (vocal or instrumental), type of event (solo or ensemble), and school size. A total of 7,355 instrumental and vocal events from two consecutive midwestern state solo and ensemblefestivals were analyzed. The twofestivals, held in 2001 and 2002, employed 75 adjudicators (33 vocal and 42 instrumental). Statistically significant differences were found in the main effects of time of day, type of event, and school size. The averages rating for all events moved toward "Superior" ("I") as the day progressed. This tendency, found in all size classifications except the largest, was most prevalent among events from mid-size schools. Large-school events received higher average ratings than did small-school events. Although preliminary analyses showed that small-school events were disproportionately held during morning hours, the interaction between time of day and school size was not significant. Significant time-of-day by performingmedium (vocal/instrumental) and type-of-event (solo/ensemble) by performing-medium interactions were found. The two performing media seemed to mirror each other's rating patterns. Vocal ensemble events were more likely to receive a superior rating than were vocal solo events, whereas the opposite was true for instrumental events. Similar time-of-day tendencies were found in both festivals, despite almost entirely different adjudicators. Representing a more even mix of public school and college teachers and selected based on different criteria, the 2002 adjudicators awarded significantly more Superior ratings.Each year, participation in evaluative festivals serves as an important rite of passage for thousands of young musicians in American schools. For many soloists and small-ensemble members, success at evaluative festivals represents the achievement of a major performance MartinJ. Bergee is an associate professor of music education in the School of Music and Learning, Teaching, and Curriculum, 138 Fine Arts, MU, University of MissouriColumbia, Columbia, MO 65211; e-mail: bergeem@missouri.edu. Melvin C. Platt is a professor of music and director of the School of Music at the same institution;

Reliability and Perceived Pedagogical Utility of a Weighted Music Performance Assessment Rubric

Latimer

Cohen

2010

The purpose of this study was to investigate the reliability and perceived pedagogical utility of a multidimensional weighted performance assessment rubric used in Kansas state high school large-group festivals. Data were adjudicator rubrics (N = 2,016) and adjudicator and director questionnaires (N = 515). Rubric internal consistency was moderately high (.88). Dimension reliability ranged from moderately low (W = .47) to moderate (W = .77). Total score reliability was moderately high (W = .80) and rating reliability was moderate (W = .72). Findings suggested that reliability on the whole was within the range of previously researched music performance assessment tools. Questionnaire results suggested that the rubric provided a better instrument for justifying ratings and more detailed descriptions of what constituted acceptable performances than previously researched nonrubric forms; hence, adjudicators and directors perceived the rubric as possessing improved pedagogical utility.