A teacher survey can provide insight into teachers' own perceptions of their teaching quality and their intentions, thought processes, knowledge, and beliefs. Teachers also have, in contrast to external raters, full knowledge of the classroom context, for example regarding the background of the performance of specific students (Goe et al., 2008). Although teacher surveys can stimulate teacher reflection, underperforming teachers might not have the metacognitive competence to recognize their professional skills and high performing teachers might underestimate these skills (Kruger & Dunning, 1999). Another way of measuring teaching practices is by means of instructional collections and artifacts. An instructional collection, also called a portfolio, is a collection of materials that is compiled by teachers to provide evidence of their fulfillment of predetermined standards (Goe et al., 2008). Examples of such evidence are lesson plans, assignments, reflective writings, and samples of student work (Gitomer & Bell, 2010). An artifact protocol is a much narrower type of instructional collection, and can for example be focused on the quality of the student assignments that teachers provide. Building an instructional collection can stimulate teacher reflection and can help them improve. It provides insight into the learning opportunities for students on a day-today basis. However, it is a time-consuming enterprise for both teachers and assessors, and one might question to what extent teachers' exemplary work reflects their everyday classroom activities. Another teaching quality measure, in which not the teaching process but the outcomes of teaching are evaluated, is a measure of a teacher's added value (Goe et al., 2008). Since schools and teachers teach different student populations, it would be unfair to compare solely the performance of the students across teachers. In value-added models, students' prior educational attainment, their background characteristics, and the school composition are often taken into account to make comparisons of teachers' output more fair (Timmermans, Bosker, Doolaard, & de Wolf, 2012). Such measures enable the evaluation of teachers' contribution to student learning in a costeffective and non-intrusive way, since most of the required data (test scores) have already been collected for other purposes (Goe et al., 2008). However, the use of value-added models in the context of education is a controversial issue. A review 1.5 A framework in support of the development, selection, and use of COSs Recent publications show that generating valid and reliable scores by means of a COS is not self-evident (e.g.,