“…While population genetics has always used statistical methods to make inferences from data, the degree of sophistication of the questions, models, data, and computational approaches used have all increased over the past two decades. Currently, there exist a myriad of computational methods that can infer the histories of populations ( Gutenkunst et al, 2009 ; Li and Durbin, 2011 ; Excoffier et al, 2013 ; Schiffels and Durbin, 2014 ; Terhorst et al, 2017 ; Ragsdale and Gravel, 2019 ), the distribution of fitness effects ( Boyko et al, 2008 ; Kim et al, 2017 ; Tataru et al, 2017 ; Fortier et al, 2019 ; Huang and Siepel, 2019 ; Vecchyo et al, 2019 ), recombination rates ( McVean et al, 2004 ; Chan et al, 2012 ; Lin et al, 2013 ; Adrion et al, 2020 ; V Barroso et al, 2019 ), and the extent of positive selection in genome sequence data ( Kim and Stephan, 2002 ; Eyre-Walker and Keightley, 2009 ; Alachiotis et al, 2012 ; Garud et al, 2015 ; DeGiorgio et al, 2016 ; Kern and Schrider, 2018 ; Sugden et al, 2018 ). While these methods have undoubtedly increased our understanding of genetic and evolutionary processes, very little has been done to systematically benchmark the quality of these inferences or their robustness to deviations from their underlying assumptions.…”