In generalizability analyses, unstable, and potentially invalid, variance component estimates may result from using only a limited portion of available data. However, missing observations are common in operational performance assessment settings because of the nature of the assessment design. This article describes a procedure for overcoming the computational and technological limitations in analyzing data with missing observations by extracting data from a sparsely filled data matrix into analyzable smaller subsets of data. This subdividing method is accomplished by creating data sets that exhibit structural designs that are common in generalizability analyses, namely, the crossed, MBIB, and nested designs. The validity of this subdividing method is examined using a Monte Carlo simulation. The method is demonstrated on an operational data set. Index terms: analysis of variance (ANOVA), crossednested-MBIB design, generalizability theory and method, interrater reliability, large-scale analysis, missing data, performance assessment, rater design, variance component.In recent years, performance assessment has become popular as a means for assessing students because these assessments provide direct measures of nontraditional student outcomes. Generalizability theory (G theory), developed by Cronbach, Gleser, and Rajaratnam (1963), is often used in the development of performance assessments to identify the relative strengths of multiple sources of measurement error and to make projections concerning how to increase score reliability. Because these assessments are time-consuming to administer and score, examinees seldom respond to all test items and raters seldom evaluate all examinee responses. As a result, a common problem encountered by those using G theory with large-scale performance assessments is working with sparse data matrices (i.e., missing data). The purpose of this article is to develop a method for analyzing data sets with missing observations. The authors examined this method in relation to rater inconsistencies, number of examinees tested, number of raters employed, and degree of standardization in distributing tasks to raters. These factors are discussed in detail in the subsequent sections.The authors first described the technical problems caused by missing observations in performance assessment data sets. Then they reviewed some common approaches used to overcome these missing data problems and the limitations of these approaches. Next, they discussed a new G theory technique, followed by an illustration of how to restructure and analyze a hypothetical sparsely filled data set so that it can be accommodated by currently available analytic methods. A Monte Carlo simulation was used to evaluate the statistical properties of this new method. Finally, the authors applied their methods to a data set coming from a large-scale direct writing assessment and presented the results of these analyses in terms of the comparability of the methods.
Large scale studies frequently use complex sampling procedures, disproportionate sampling weights, and adjustment techniques to account for potential bias due to nonresponses and to ensure that results from the sample can be generalized to a larger population. Survey researchers are concerned about measurement error and the use of weights in developing models. Consequently, multiple weighting factors are used and these weighting factors are manifested as a final survey (composite) weight available for analysis. We developed a method to incorporate an external weighting factor like this for analyses of measurement errors in the theory of generalizability to provide researchers with a tool to evaluate the measurement error components of survey quality and undesirable error components of large-scale assessment programs such as national and state assessments.
This paper introduces a graphical method SEE Repeated-measure data (SEER) to visually analyze data commonly collected in large-scale surveys, market research, biostatistics, and educational and psychological measurement. Many researchers in these disciplines encounter large amounts of data. Examples include the Law School Admission Test (LSAT) repeater scores, career paths of students graduated from college, essays scores in the writing assessments of the National Assessment of Educational Progress (NAEP), and scores derived from different test equating methods in the discipline of psychometrics. Efficiency, ease-of-interpretations, applicability, user interactions are challenges due to the graphical complexity in visualizing large-scale data sets. To overcome these challenges, the author expands a systematic data-visualization technique, called SEER. The SEER technique was originally designed to depict career paths and occupational stability for professionals in the science and engineering discipline. In this paper, the author summarizes this example and highlights its applications in legal education, psychometrics, and other related areas. The author also, (a) expands this technique to examine repeat test takers' scores, (b) illustrates how to monitor inter-rater consistency for essay scoring and for depicting multi-faceted data that involve human judgments, and (c) demonstrates how to investigate differences of test equating and scaling methodology using the SEER method. The broader impacts and design patterns of the SEER method are discussed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.