With growing interest in the role of teachers as the key mediators between educational policies and outcomes, the importance of developing good measures of classroom processes has become increasingly apparent. Yet, collecting reliable and valid information about a construct as complex as instruction poses important conceptual and technical challenges. This article summarizes the results of two studies that investigated the properties of measures of instruction based on a teacher-generated instrument (the Scoop Notebook) that combines features of portfolios and self-report. Classroom artifacts and teacher reflections were collected from samples of middle school science classrooms and rated along 10 dimensions of science instruction derived from the National Science Education Standards; ratings based on direct classroom observations were used as comparison. The results suggest that instruments that combine artifacts and self-reports hold promise for measuring science instruction with reliability similar to, and sizeable correlations with, measures based on classroom observation. We discuss the implications and lessons learned from this work for the conceptualization, design, and use of artifact-based instruments for measuring instructional practice in different contexts and for different purposes. Artifact-based instruments may illuminate features of instruction not apparent even through direct classroom observation; moreover, the process of structured collection and reflection on artifacts may have value for professional development. However, their potential value and applicability on a larger scale depends on careful consideration of the match between the instrument and the model of instruction, the intended uses of the measures, and the aspects of classroom practice most amenable to reliable scoring through artifacts. We outline a research agenda for addressing unresolved questions and advancing theoretical and practical knowledge around the measurement of instructional practice. ß 2011 Wiley Periodicals, Inc. J Res Sci Teach 49: 2012 Keywords: science education; measurement of instruction; generalizability theory There is growing consensus among researchers and policymakers about the importance of accurate, valid, and efficient measures of instructional practice in science classrooms. Instruction directly or indirectly mediates the success of many school improvement efforts and thus accurate descriptions of what teachers do in classrooms as they attempt to implement reforms is key for understanding ''what works'' in education, and equally importantly, ''how?'' Many educational policies and programs rely on claims about the value of certain practices for improving student outcomes; for example, the No Child Left Behind legislation prompted schools to adopt scientifically based practices to improve the achievement of all students. Similarly, the reform teaching movement often recommends specific approaches to instruction designed to promote higher-level learning. More generally, the National Researc...