Given the serious consequences of making ill‐fated admissions and funding decisions for applicants to graduate and professional school, it is important to rely on sound evidence to optimize such judgments. Previous meta‐analytic research has demonstrated the generalizable validity of the GRE® General Test for predicting academic achievement. That research does not address predictive validity for specific populations and situations or the predictive validity of the GRE Analytical Writing section introduced in October 2002. Furthermore, much of the past GRE predictive validity research is primarily based on approaches that are correlational and univariate only. Stakeholders familiar with GRE predictive validity mainly in the form of zero‐order correlation coefficients might automatically interpret the usefulness of the GRE solely through the prism of Cohen's (1988) guidelines for judging effect sizes and without regard to the larger context. However, by using innovative and multivariate approaches to conceptualize and measure GRE predictive validity within the larger context, our investigation reveals the substantial value of the GRE General Test, including its Analytical Writing section, for predicting graduate school grades.
In this research, we investigated the suitability of implementing e‐rater® automated essay scoring in a high‐stakes large‐scale English language testing program. We examined the effectiveness of generic scoring and 2 variants of prompt‐based scoring approaches. Effectiveness was evaluated on a number of dimensions, including agreement between the automated and the human score and relations with criterion variables. Results showed that the sample size was generally not sufficient for prompt‐specific scoring. For the generic scoring model, automated scores agreed with human raters as strongly as, or more strongly than, human raters agreed with one another for more than 97% of the prompts. The impact of substituting e‐rater for the second human rater made no practically important impact on test takers' scores at both the item and total test score levels. However, neither automated scoring models nor human raters performed invariantly across all prompts or across different test countries/territories. Further investigation indicated homogeneity in the examinee population, possibly nested within test countries/territories as one potential cause of this lack of invariance. Among other limitations, findings may not be generalizable beyond the examinee population investigated in this study.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.