Preface

Nisbet, Bob; Elder, John P.; Miner, Gary

doi:10.1016/b978-0-12-374765-5.00039-5

Cited by 1 publication

(1 citation statement)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Data leakage or information leakage is when a model gains information from unavailable or unseen data during the training phase, which leads to biased results. − Data leakage can take place throughout the entire modeling process, including data collection, data preprocessing, model implementation, and model optimization. However, despite its importance, DLM is commonly overlooked in environmental research.…”

Section: Common Pitfalls and Good Practicesmentioning

confidence: 99%

Machine Learning in Environmental Research: Common Pitfalls and Best Practices

Zhu

Yang

Ren

2023

Environ. Sci. Technol.

125

View full text Add to dashboard Cite

Machine learning (ML) is increasingly used in environmental research to process large data sets and decipher complex relationships between system variables. However, due to the lack of familiarity and methodological rigor, inadequate ML studies may lead to spurious conclusions. In this study, we synthesized literature analysis with our own experience and provided a tutorial-like compilation of common pitfalls along with best practice guidelines for environmental ML research. We identified more than 30 key items and provided evidence-based data analysis based on 148 highly cited research articles to exhibit the misconceptions of terminologies, proper sample size and feature size, data enrichment and feature selection, randomness assessment, data leakage management, data splitting, method selection and comparison, model optimization and evaluation, and model explainability and causality. By analyzing good examples on supervised learning and reference modeling paradigms, we hope to help researchers adopt more rigorous data preprocessing and model development standards for more accurate, robust, and practicable model uses in environmental research and applications.

show abstract

Section: Common Pitfalls and Good Practicesmentioning

confidence: 99%