Accurate field data are essential to understanding ecological systems and forecasting their responses to global change. Yet, data collection errors are common, and data analysis often lags far enough behind its collection that many errors can no longer be corrected, nor can anomalous observations be revisited. Needed is a system in which data quality assurance and control (QA/QC), along with the production of basic data summaries, can be automated immediately following data collection.
Here, we implement and test a system to satisfy these needs. For two annual tree mortality censuses and a dendrometer band survey at two forest research sites, we used GitHub Actions continuous integration (CI) to automate data QA/QC and run routine data wrangling scripts to produce cleaned datasets ready for analysis.
This system automation had numerous benefits, including (1) the production of near real‐time information on data collection status and errors requiring correction, resulting in final datasets free of detectable errors, (2) an apparent learning effect among field technicians, wherein original error rates in field data collection declined significantly following implementation of the system, and (3) an assurance of computational reproducibility—that is, robustness of the system to changes in code, data and software.
By implementing CI, researchers can ensure that datasets are free of any errors for which a test can be coded. The result is dramatically improved data quality, increased skill among field technicians, and reduced need for expert oversight. Furthermore, we view CI implementation as a first step towards a data collection and analysis pipeline that is also more responsive to rapidly changing ecological dynamics, making it better suited to study ecological systems in the current era of rapid environmental change.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.