“…While past studies have shown that predictive models can be useful for assessing public health hazards in recreational water (Olyphant, 2005;Hou et al, 2006;Hamilton and Luffman, 2009;Francy et al, 2013;Francy et al, 2014;Dada and Hamilton, 2016;Dada, 2019;Rossi et al, 2020), no models, to the author's knowledge, have been developed to predict E. coli levels in surface water used for produce production (e.g., for irrigation, pesticide application, dust abatement, frost protection). Moreover, many of the recreational water quality studies only considered one algorithm during model development (e.g., (Olyphant, 2005;Hamilton and Luffman, 2009), including algorithms [e.g., regression, (Olyphant, 2005;Hamilton and Luffman, 2009)], which has more assumptions and may be less accurate than alternate algorithms (e.g., ensemble methods, support vector machines, (Kuhn and Johnson, 2016;Weller et al, 2020a)). As such, there is limited data on 1) how models for predicting E. coli levels in agricultural water should be implemented and validated, or 2) how the data used to train these models should be collected (e.g., types of features to focus data collection efforts on).…”