Discovery environments are ubiquitous in academic libraries but studying their effectiveness and use in an academic environment has mostly centered around user satisfaction, experience, and task analysis. This study aims to create a quantitative, reproducible framework to test the relevancy of results and the overall success of Washington State University’s discovery environment (Primo by Ex Libris). Within this framework, the authors use bibliographic citations from student research papers submitted as part of a required university class as the proxy for relevancy. In the context of this study, the researchers created a testing model that includes: (1) a process to produce machine-generated keywords from a corpus of research papers to compare against a set of human-created keywords, (2) a machine process to query a discovery environment to produce search result lists to compare against citation lists, and (3) four metrics to measure the comparative success of different search strategies and the relevancy of the results. This framework is used to move beyond a sentiment or task-based analysis to measure if materials cited in student papers appear in the results list of a production discovery environment. While this initial test of the framework produced fewer matches between researcher-generated search results and student bibliography sources than expected, the authors note that faceted searches represent a greater success rate when compared to open-ended searches. Future work will include comparative (A/B) testing of commonly deployed discovery layer configurations and limiters to measure the impact of local decisions on discovery layer efficacy as well as noting where in the results list a citation match occurs.