Camera traps are a popular tool in terrestrial wildlife research due to their low costs, easy operability, and usefulness for studying a wide array of species and research questions. The vast numbers of images they generate often require multiple human data extractors, yet accuracy and inter-observer variance are rarely considered. We compared results from 10 observers who processed the same set of multi-species camera trap images (n = 11,560) from seven sites. We quantified inter-observer agreement and variance for (1) the number of mammals identified, (2) the number of images saved, (3) species identification accuracy and the types of mistakes made, and (4) counts of herbivore groups and individuals. We analysed the influence of observer experience, species distinctiveness and camera location. Observers varied significantly regarding image processing rates, the number of mammals found and images saved, and species misidentifications. Only one observer detected all 22 mammals (range: 18–22, n = 10). Experienced observers processed images up to 4.5 times faster and made less mistakes regarding species detection and identification. Missed species were mostly small mammals (56.5%) while misidentifications were most common among species with low phenotypic distinctiveness. Herbivore counts had high to very high variances with mainly moderate agreement across observers. Observers differed in how they processed images and what they recorded. Our results raise important questions about the reliability of data extracted by multiple observers. Inter-observer bias, observer-related variables, species distinctiveness and camera location are important considerations if camera trapping results are to be used for population estimates or biodiversity assessments.