Quantitative behavioral measurements are important for answering questions across scientific 13 disciplines-from neuroscience to ecology. State-of-the-art deep-learning methods offer major advances in 14 data quality and detail by allowing researchers to automatically estimate locations of an animal's body parts 15 directly from images or videos. However, currently-available animal pose estimation methods have limitations 16 in speed and robustness. Here we introduce a new easy-to-use software toolkit, DeepPoseKit, that addresses 17 these problems using an efficient multi-scale deep-learning model, called Stacked DenseNet, and a fast 18 GPU-based peak-detection algorithm for estimating keypoint locations with subpixel precision. These advances 19 improve processing speed >2× with no loss in accuracy compared to currently-available methods. We 20 demonstrate the versatility of our methods with multiple challenging animal pose estimation tasks in laboratory 21 and field settings-including groups of interacting individuals. Our work reduces barriers to using advanced 22 tools for measuring behavior and has broad applicability across the behavioral sciences. 23 24 42 et al. (2018) and Pereira et al. (2019), who make use of a popular type of machine learning model called 43 convolutional neural networks, or CNNs (LeCun et al. 2015; Appendix 1), to automatically measure detailed 44 representations of animal posture-structural keypoints, or joints, on the animal's body-directly from images 45 1 of 36and without markers. While these methods offer a major advance over conventional methods with regard 46 to data quality and detail, they have disadvantages in terms of speed and robustness, which may limit their 47 157 inference speed for the DeepLabCut model (Mathis et al., 2018) can be improved by decreasing the resolution 158 4 of 36 235 models to the models from Mathis et al. (2018) (DeepLabCut) and Pereira et al. (2019) (LEAP) in terms of speed, 236 accuracy, training time, and generalization ability. We benchmarked these models using three image datasets 237 recorded in the laboratory and the field-including multiple interacting individuals that were first localized and 238 cropped from larger, multi-individual images (see "Methods" for details). 239 6 of 36 654 and TensorFlow teams, and Alexander Jung for their open source contributions, which provided the core 655programming interface for our work. We thank A. Strandburg-Peshkin, Vivek H. Sridhar, Michael L. Smith, and 656 Joseph B. Bak-Coleman for their helpful discussions on the project and comments on the manuscript. We 657 also thank M.L.S. for the use of his GPU. We thank Felicitas Oehler for annotating the zebra posture data and 658 Chiara Hirschkorn for assistance with filming the locusts and annotating the locust posture data. We thank 659