2021
DOI: 10.3389/frai.2021.628441
|View full text |Cite
|
Sign up to set email alerts
|

Interpretability Versus Accuracy: A Comparison of Machine Learning Models Built Using Different Algorithms, Performance Measures, and Features to Predict E. coli Levels in Agricultural Water

Abstract: Since E. coli is considered a fecal indicator in surface water, government water quality standards and industry guidance often rely on E. coli monitoring to identify when there is an increased risk of pathogen contamination of water used for produce production (e.g., for irrigation). However, studies have indicated that E. coli testing can present an economic burden to growers and that time lags between sampling and obtaining results may reduce the utility of these data. Models that predict E. coli levels in a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
15
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
7

Relationship

1
6

Authors

Journals

citations
Cited by 24 publications
(16 citation statements)
references
References 69 publications
1
15
0
Order By: Relevance
“…While model performance generally did not significantly differ, the RF model was found to provide consistently better performance than any of the other models evaluated. Several publications focusing on ML model evaluation specifically for microbial water quality purposes have reached similar conclusions (Avila et al, 2018;Chen et al, 2020;Weller et al, 2021). The SGB model was found to provide the second-best performance across all three metrics from all datasets.…”
Section: Discussionsupporting
confidence: 65%
See 2 more Smart Citations
“…While model performance generally did not significantly differ, the RF model was found to provide consistently better performance than any of the other models evaluated. Several publications focusing on ML model evaluation specifically for microbial water quality purposes have reached similar conclusions (Avila et al, 2018;Chen et al, 2020;Weller et al, 2021). The SGB model was found to provide the second-best performance across all three metrics from all datasets.…”
Section: Discussionsupporting
confidence: 65%
“…The SVM algorithm typically provided lower RMSE than the kNN algorithm and this may be because SVM is robust to outliers especially when non-linear kernels are used. The RBF kernel was used in this study because it provided substantially better results than the linear or polynomial kernels (data not shown) which was also reported in the work by Weller et al (2021) who used ML algorithms for the prediction of E. coli in NY streams. Several other water quality studies have also reported better performance of SVM than kNN when the two have been compared (Modaresi and Araghinejad, 2014;Danades et al, 2016;Babbar and Babbar, 2017;Prakash et al, 2018;Chen et al, 2020).…”
Section: Discussionmentioning
confidence: 71%
See 1 more Smart Citation
“…Conversely, the current study focuses on the 1) development and comparison of predictive models using different algorithms and feature types, 2) impact of resampling methods (to address data imbalance) on model performance, and 3) identification of features that drive model accuracy. Moreover, unlike previous, applied studies that developed models to predict the presence of food safety hazards in agricultural water using balanced presenceabsence Polat et al (2019), Weller et al (2020c) or continuous Weller et al (2021) data, the current study focuses on predicting Listeria contamination using moderately (nonpathogenic Listeria spp.) and severely (L. monocytogenes) imbalanced presenceabsence data (Table 1).…”
Section: Methodsmentioning
confidence: 99%
“…Models built using all four feature types were designated "full models." The 15 learners used to build the full models can be grouped into 1) tree-based learners, 2) ensemble learners (or forests), 3) regression, rule-based learners, and 4) support vector machines [SVM; for descriptions of each learner as well as its (dis)advantages and tunable parameters see (Bischl et al, 2016b;Kuhn and Johnson, 2016;Weller et al, 2020c;Weller et al, 2021)]. Separately from the full models, "nested models" were developed to assess the relative information gain associated with using different feature types for model training.…”
Section: Predictive Modelsmentioning
confidence: 99%