Our understanding of collective animal behavior is limited by our ability to track each of the individuals. We describe an algorithm and software, idtracker.ai, that extracts from video all trajectories with correct identities at a high accuracy for collectives of up to 100 individuals. It uses two deep networks, one detecting when animals touch or cross and another one for animal identi cation, trained adaptively to conditions and di culty of the video.Obtaining animal trajectories from a video faces the problem of how to track animals with correct identities after they touch, cross or they are occluded by environmental features. To bypass this problem, we proposed in idTracker the idea of tracking by identi cation of each individual using a set of reference images obtained from the video [1]. idTracker and further developments in animal identi cation algorithms [2-6] can work for small groups of 2-15 individuals. In larger groups, they only work for particular videos with few animal crossings [7] or with few crossings of particular species-speci c features [5].Here we present idtracker.ai, a system to track all individuals in small or large collectives (up to 100 individuals) at a high identi cation accuracy, often of > 99.9%. The method is species-agnostic and we have tested it in small and large collectives of zebra sh, Danio rerio and ies, Drosophila melanogaster. Code, quickstart guide and data used are provided (see Methods), and Supplementary Text describes algorithms and gives pseudocode. A graphical user interface walks users through tracking, exploration and validation (Fig. 1a).Similar to idTracker [1], but with di erent algorithms, idtracker.ai identi es animals using their visual features. In idtracker.ai, animal identi cation is done adapting deep learning [8][9][10] to work in videos of animal collectives thanks to speci c training protocols. In brief, it consists of a series of processing steps summarized in Fig. 1b. After image preprocessing, the rst deep network nds when animals are touching or crossing. Then the system uses the images between these detected to train a second deep network for animal identi cation. The system rst assumes that a single portion of video when animals do not touch or cross has enough images to properly train the identi cation network (Protocol 1). However, animals touch or cross often and this portion is then typically very short, making the system estimate that identi cation quality is too low. If this happens, two extra 1 . CC-BY-NC 4.0 International license not peer-reviewed) is the author/funder. It is made available under a