Occasions and the Reliability of Classroom Observations: Alternative Conceptualizations and Methods of Analysis

Meyer, Joseph P.; Cash, Anne H.; Mashburn, Andrew J.

doi:10.1080/10627197.2011.638884

Cited by 28 publications

(21 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…In general, studies indicate that scores do not meet acceptable reliability standards (American Educational Research Association [AERA], American Psychological Association [APA], & National Council on Measurement in Education [NCME], 2014) when they are based on the number of observed lessons typical in teacher evaluations (e.g., Ho & Kane, 2013; Praetorius et al, 2012). In addition, ratings of instructional practices are less stable, and therefore require more observations, than those of classroom climate (Meyer et al, 2011; Praetorius et al, 2014). Two features of this research are relevant to our present study.…”

Section: Measuring Score Stability With Generalizability Theorymentioning

confidence: 99%

The Reliability of Framework for Teaching Scores in Kindergarten

Patrick

French

Mantzicopoulos

2020

Journal of Psychoeducational Assessment

View full text Add to dashboard Cite

We evaluated the score stability of the Framework for Teaching (FFT), a prominent observation instrument used for teacher evaluation. Three raters each scored 200 reading and mathematics lessons taught by 20 kindergarten teachers. Using Generalizability theory analyses, we decomposed the FFT’s Classroom Environment, Instruction, and Total scores into potential sources of variation (teachers, lessons, raters, and their interactions). The scores’ variances attributable to differences among teachers were 71% and 76% for Classroom Environment, 49% and 37% for Instruction, and 69% and 66% for the Total score, for reading and mathematics, respectively. Reliability estimates (G) ranged from 0.92 to 0.96 for Classroom Environment and Total scores; they were 0.87 and 0.79 for reading and mathematics Instruction. Decision studies indicated that two raters, each scoring three reading lessons or four mathematics lessons, are necessary to achieve sufficiently reliable Total scores. For Instruction scores, three raters each scoring seven readings lessons are needed; more than four raters each scoring eight lessons are needed for mathematics.

show abstract

Section: Measuring Score Stability With Generalizability Theorymentioning

confidence: 99%

The Reliability of Framework for Teaching Scores in Kindergarten

Patrick

French

Mantzicopoulos

2020

Journal of Psychoeducational Assessment

View full text Add to dashboard Cite

show abstract

“…Therefore, the use of FFT to evaluate special education instruction could lead to an evaluation that is not aligned with the research base and that endorses practices that do not lead to improved outcomes for SWD. In addition, there is a growing body of research that indicates for observations to be useful for identifying effective teachers, or simply in improving classroom quality and teacher practice, these measures must use standardized observation protocols that minimize measurement error and permit valid inferences (Meyer, Cash, & Mashburn, 2011), which is work that has yet to be completed on FFT.…”

Section: Using Observation Tools For Special Education Teacher Evaluamentioning

confidence: 99%

Special Education Teacher Evaluation

Johnson

Semmelroth

2013

Assessment for Effective Intervention

View full text Add to dashboard Cite

There is currently little consensus on how special education teachers should be evaluated. The lack of consensus may be due to several reasons. Special education teachers work under a variety of complex conditions, with a very heterogeneous population, and support student progress toward a very individualized set of goals. In addition, special education is marked by historical rates of attrition, with a lack of highly qualified teachers entering the field, and a number of special education teachers completing alternate certification programs, leading to a combined effect that impacts overall professional quality. In this article, we first review the challenges associated with evaluating special education teachers, describe and analyze current approaches, and present a conceptual framework for an approach to special education teacher evaluation. We then provide an overview of the Recognizing Effective Special Education Teachers (RESET) tool as a possible alternative to measure special education teacher effectiveness. Given the current zeitgeist of teacher evaluation systems that fail to address the unique circumstances related to special education teachers, it is hoped that the information in this article will contribute to the small but growing body of research on special education teacher evaluation and effectiveness.

show abstract

“…In order to effectively evaluate teachers' classroom teaching performance, a number of factors should be taken into account, such as the evaluation time, curricula, and raters' preference (e.g., Goe et al, 2008; Meyer et al, 2012; Gitomer et al, 2014). For most Chinese colleges, teaching evaluations are compulsory.…”

Section: Introductionmentioning

confidence: 99%

A Multivariate Generalizability Theory Approach to College Students' Evaluation of Teaching

Hou

Wang

et al. 2018

Front. Psychol.

View full text Add to dashboard Cite

Teachers' teaching level evaluation is an important component in classroom teaching and professional promotion in the institutions of higher learning in China. Many self-made questionnaires are currently being administered to Chinese college students to evaluate teachers' classroom teaching performance. Quite often, due to the absence of strong educational, and psychological measurements and theoretical foundations for these questionnaires, their dependability remains open to doubt. Evaluation time points, the number of students, major type, and curriculum type were examined in relation to college students' perceptions on their teachers' classroom teaching performance, using Teachers' Teaching Level Evaluation Scale for Colleges (TTLES-C). Data were collected in a sample of 556 students at two time points from three Chinese universities and were analyzed using multivariate generalizability theory. Results showed that evaluations at the beginning of the spring semester produced better outcomes than did evaluations at the end of the fall semester, and 20 student evaluators were sufficient to ensure good dependability. Results also revealed that the evaluation dependability of science curriculum appeared higher than that of liberal arts curriculum. Recommendations were discussed on the evaluation criteria and mode.

show abstract

Occasions and the Reliability of Classroom Observations: Alternative Conceptualizations and Methods of Analysis

Cited by 28 publications

References 28 publications

The Reliability of Framework for Teaching Scores in Kindergarten

The Reliability of Framework for Teaching Scores in Kindergarten

Special Education Teacher Evaluation

A Multivariate Generalizability Theory Approach to College Students' Evaluation of Teaching

Contact Info

Product

Resources

About