Large‐scale citizen‐science projects, such as atlases of species distribution, are an important source of data for macroecological research, for understanding the effects of climate change and other drivers on biodiversity, and for more applied conservation tasks, such as early‐warning systems for biodiversity loss.
However, citizen‐science data are challenging to analyse because the observation process has to be taken into account. Typically, the observation process leads to heterogeneous and non‐random sampling, false absences, false detections, and spatial correlations in the data. Increasingly, occupancy models are being used to analyse atlas data.
We advocate a dual approach to strengthen inference from citizen science data for the questions the programme is intended to address: (a) the survey design should be chosen with a particular set of questions and associated analysis strategy in mind and (b) the statistical methods should be tailored not only to those questions but also to the specific characteristics of the data.
We review the consequences of particular survey design choices that typically need to be made in atlas‐style citizen‐science projects. These include spatial resolution of the sampling units, allocation of effort in space, and collection of information about the observation process. On the analysis side, we review extensions of the basic occupancy models that are frequently necessary with atlas data, including methods for dealing with heterogeneity, non‐independent detections, false detections, and violation of the closure assumption.
New technologies, such as cell‐phone apps and fixed remote detection devices, are revolutionizing citizen‐science projects. There is an opportunity to maximize the usefulness of the resulting datasets if the protocols are rooted in robust statistical designs and data analysis issues are being considered. Our review provides guidelines for designing new projects and an overview of the current methods that can be used to analyse data from such projects.