Camera traps facilitate non‐invasive wildlife monitoring, but their widespread adoption has created a data processing bottleneck: a camera trap survey can create millions of images, and the labour required to review those images strains the resources of conservation organisations. AI is a promising approach for accelerating image review, but AI tools for camera trap data are imperfect; in particular, classifying small animals remains difficult, and accuracy falls off outside the ecosystems in which a model was trained. It has been proposed that incorporating an object detector into an image analysis pipeline may help address these challenges, but the benefit of object detection has not been systematically evaluated in the literature. In this work, the authors assess the hypothesis that classifying animals cropped from camera trap images using a species‐agnostic detector yields better accuracy than classifying whole images. We find that incorporating an object detection stage into an image classification pipeline yields a macro‐average F1 improvement of around 25% on a large, long‐tailed dataset; this improvement is reproducible on a large public dataset and a smaller public benchmark dataset. The authors describe a classification architecture that performs well for both whole and detector‐cropped images, and demonstrate that this architecture yields state‐of‐the‐art benchmark accuracy.