Aim
Palaeoecological data are crucial for comprehending large‐scale biodiversity patterns and the natural and anthropogenic drivers that influence them over time. Over the last decade, the availability of open‐access research databases of palaeoecological proxies has substantially increased. These databases open the door to research questions needing advanced numerical analyses and modelling based on big‐data compilations. However, compiling and analysing palaeoecological data pose unique challenges that require a guide for producing standardized and reproducible compilations.
Innovation
We present a step‐by‐step guide of how to process fossil pollen data into a standardized dataset compilation ready for macroecological and palaeoecological analyses. We describe successive criteria that will enhance the quality of the compilations. Though these criteria are project and research question‐dependent, we discuss the most important assumptions that should be considered and adjusted accordingly. Our guide is accompanied by an R‐workflow—called FOSSILPOL—and corresponding R‐package—called R‐Fossilpol—that provide a detailed protocol ready for interdisciplinary users. We illustrate the workflow by sourcing and processing Scandinavian fossil pollen datasets and show the reproducibility of continental‐scale data processing.
Main Conclusions
The study of biodiversity and macroecological patterns through time and space requires large‐scale syntheses of palaeoecological datasets. The data preparation for such syntheses must be transparent and reproducible. With our FOSSILPOL workflow and R‐package, we provide a protocol for optimal handling of large compilations of fossil pollen datasets and workflow reproducibility. Our workflow is also relevant for the compilation and synthesis of other palaeoecological proxies and as such offers a guide for synthetic and cross‐disciplinary analyses with macroecological, biogeographical and palaeoecological perspectives. However, we emphasize that expertise and informed decisions based on palaeoecological knowledge remain crucial for high‐quality data syntheses and should be strongly embedded in studies that rely on the increasing amount of open‐access palaeoecological data.