Large neural network-based language models play an increasingly important role in contemporary AI. Although these models demonstrate sophisticated text generation capabilities, they have also been shown to reproduce harmful social biases contained in their training data. This paper presents a project that guides students through an exploration of social biases in large language models.
As a final project for an intermediate college course in Artificial Intelligence, students developed a bias probe task for a previously-unstudied aspect of sociolinguistic or sociocultural bias they were interested in exploring. Through the process of constructing a dataset and evaluation metric to measure bias, students mastered key technical concepts, including how to run contemporary neural networks for natural language processing tasks; construct datasets and evaluation metrics; and analyze experimental results. Students reported their findings in an in-class presentation and a final report, recounting patterns of predictions that surprised, unsettled, and sparked interest in advocating for technology that reflects a more diverse set of backgrounds and experiences.
Through this project, students engage with and even contribute to a growing body of scholarly work on social biases in large language models.