This brief describes the results of a data challenge to evaluate the potential of using automated scoring techniques to score open-ended responses to reading assessment items. The purpose of the challenge was to help NAEP determine the existing capabilities, accuracy metrics, the underlying validity evidence of assigned scores, and costs and efficiencies of using automated scoring.
Natural language processing (NLP) is widely used to predict human scores for open-ended student responses for education assessments in various content areas [9]. Ensuring algorithmic fairness and minimizing distortion based on student demographic background and other contextual factors is crucial to ensure that scoring does not inaccurately represent student populations [14]. This study presents a fairness analysis of six top-performing entries from a data challenge involving 20 reading comprehension items from the National Assessment of Education Progress (NAEP). These submissions were initially analyzed for fairness based on race/ethnicity and gender. This study describes additional fair-ness evaluation by the challenge team, which included extra contextual variables for English Language Learner Status, Individual Education Plans, Free and Reduced-Price lunch, and other variables. Several items showed lower accuracy for predicted scores for some student subgroups, particularly for English Language Learners, compared to other students. This study recommends considering additional demographic factors in future fairness scoring evaluations and new approaches to fairness analysis that consider multiple factors and the contexts in which student identity is experienced.
This brief describes the results of a data challenge to evaluate the potential of using automated scoring techniques to score open-ended responses to reading assessment items. The purpose of the challenge was to help NAEP determine the existing capabilities, accuracy metrics, the underlying validity evidence of assigned scores, and costs and efficiencies of using automated scoring.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.