Abstract. Atmospheric observations in remote locations offer a possibility of exploring trace gas and particle concentrations in pristine environments. However,
data from remote areas are often contaminated by pollution from local
sources. Detecting this contamination is thus a central and frequently
encountered issue. Consequently, many different methods exist today to
identify local contamination in atmospheric composition measurement time
series, but no single method has been widely accepted. In this study, we
present a new method to identify primary pollution in remote atmospheric
datasets, e.g., from ship campaigns or stations with a low background signal compared to the contaminated signal. The pollution detection algorithm (PDA) identifies and flags periods of polluted data in five steps. The first and most important step identifies polluted periods based on the derivative (time derivative) of a concentration over time. If this derivative exceeds a given threshold, data are flagged as polluted. Further pollution
identification steps are a simple concentration threshold filter, a
neighboring points filter (optional), a median, and a sparse data filter (optional). The PDA only relies on the target dataset itself and is
independent of ancillary datasets such as meteorological variables. All
parameters of each step are adjustable so that the PDA can be “tuned” to
be more or less stringent (e.g., flag more or fewer data points as contaminated). The PDA was developed and tested with a particle number concentration
dataset collected during the Multidisciplinary drifting Observatory for the
Study of Arctic Climate (MOSAiC) expedition in the central Arctic. Using strict settings, we identified 62 % of the data as influenced by local
contamination. Using a second independent particle number concentration
dataset also collected during MOSAiC, we evaluated the performance of the
PDA against the same dataset cleaned by visual inspection. The two methods
agreed in 94 % of the cases. Additionally, the PDA was successfully
applied to a trace gas dataset (CO2), also collected during MOSAiC, and to another particle number concentration dataset, collected at the high-altitude background station Jungfraujoch, Switzerland. Thus, the PDA
proves to be a useful and flexible tool to identify periods affected by
local contamination in atmospheric composition datasets without the need for ancillary measurements. It is best applied to data representing primary
pollution. The user-friendly and open-access code enables reproducible application to a wide suite of different datasets. It is available at https://doi.org/10.5281/zenodo.5761101 (Beck et al., 2021).