Aim:To improve the accuracy of inferences on habitat associations and distribution patterns of rare species by combining machine-learning, spatial filtering and resampling to address class imbalance and spatial bias of large volumes of citizen science data.
Innovation:Modelling rare species' distributions is a pressing challenge for conservation and applied research. Often, a large number of surveys are required before enough detections occur to model distributions of rare species accurately, resulting in a data set with a high proportion of non-detections (i.e. class imbalance). Citizen science data can provide a cost-effective source of surveys but likely suffer from class imbalance.Citizen science data also suffer from spatial bias, likely from preferential sampling. To correct for class imbalance and spatial bias, we used spatial filtering to under-sample the majority class (non-detection) while maintaining all of the limited information from the minority class (detection). We investigated the use of spatial under-sampling with randomForest models and compared it to common approaches used for imbalanced data, the synthetic minority oversampling technique (SMOTE), weighted random forest and balanced random forest models. Model accuracy was assessed using kappa, Brier score and AUC. We demonstrate the method by evaluating habitat associations and seasonal distribution patterns using citizen science data for a rare species, the tricoloured blackbird (Agelaius tricolor).Main Conclusions: Spatial under-sampling increased the accuracy of each model and outperformed the approach typically used to direct under-sampling in the SMOTE algorithm. Our approach is the first to characterize winter distribution and movement of tricoloured blackbirds. Our results show that tricoloured blackbirds are positively associated with grassland, pasture and wetland habitats, and negatively associated with high elevations or evergreen forests during both winter and breeding seasons. The seasonal differences in distribution indicate that individuals move to the coast during the winter, as suggested by historical accounts.
K E Y W O R D Scitizen science, class imbalance, random forest, spatial bias, species distribution model, tricoloured blackbird | 461 ROBINSON et al.