2019
DOI: 10.31045/jes.2.2.2
|View full text |Cite
|
Sign up to set email alerts
|

Teacher Observation and Reliability: Additional Insights Gathered from Inter-rater Reliability Analyses

Abstract: Using a newly created teacher evaluation instrument, Inter-rater Reliability (IRR) analyses were conducted on four teacher videos as a means to establish instrument reliability. Raters included 42 principals and assistant principals in a southern US school district. The videos used spanned the teacher quality spectrum and the IRR findings across these levels varied. Key findings suggest that while the overall IRR coefficient may be adequate to assess the validity of a classroom observation instrument, the over… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 11 publications
(5 citation statements)
references
References 21 publications
0
5
0
Order By: Relevance
“…Fourteen are men and seven women. We did an evaluation using Gwets AC1 [22]. Gwet's AC1 can show the level of agreement between two experts.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Fourteen are men and seven women. We did an evaluation using Gwets AC1 [22]. Gwet's AC1 can show the level of agreement between two experts.…”
Section: Discussionmentioning
confidence: 99%
“…The number of diagrams used after the validity test was twenty-five student answer diagrams with a data reliability value of 0.959. Since this research aimed to establish a standardized assessment based on expert consensus, we also tested inter-rater reliability [22] amongst the experts. The average measure of interclass correlation is 0.929, based on nineteen experts.…”
Section: Discussionmentioning
confidence: 99%
“…This can involve using the kappa statistic [37,38,39] or Gwet's AC1 [40,41,42] to measure a method's results and the Alpha Cronbach [43,44] to ensure the data's reliability. But, Gwet's AC1 could be better than the kappa statistic for assessment case [45,46,47,48]. Software reuse papers tested their research using precision and recall, with none utilizing similarity measurements to test their research.…”
Section: What Are the Parameters (Measuring Instruments) Used To Measure The Similarity Between Two Software Products?mentioning
confidence: 99%
“…The basis for administrators' concerns are founded in the respective, but often conflated and inconsistent, purposes and methods of teacher supervision and evaluation (Zepeda & Jimenez, 2019). For example, teacher evaluation can be useful for removing underperforming teachers (Grissom & Bartanen, 2018), however a much larger majority of teachers need a system that provides formative feedback which can be used to improve instructional practices (Mette et al, 2015;Stark et al, 2017).…”
Section: Background and Conceptual Frameworkmentioning
confidence: 99%