Technological advances in psychological research have enabled large-scale studies of human behavior and streamlined pipelines for automatic processing of data. However, studies of infants and children have not fully reaped these benefits, because the behaviors of interest, such as gaze duration and direction, even when collected online, still have to be extracted from video through a laborious process of manual annotation. Recent advances in computer vision raise the possibility of automated annotation of video data. In this paper, we built on a system for automatic gaze annotation in human infants, iCatcher (Erel et al., 2022), by engineering improvements, and then training and testing the system (hereafter, iCatcher+) on two datasets with substantial video and participant variability (214 videos collected in lab and mobile testing centers, and 265 videos collected via webcams in homes; infants and children aged 4 months to 3.5 years). We found that when trained on each of these video datasets, iCatcher+ performed with near human-level accuracy on held out videos on distinguishing “LEFT” and “RIGHT” looking behavior, and “ON” versus “OFF” looking behavior, across both datasets. This high performance was achieved at the level of individual frames, experimental trials, and study videos, held across participant demographics (e.g., age, race/ethnicity) and video characteristics (e.g., resolution, luminance), and generalized to a third, entirely held-out dataset. We close by discussing next steps required to fully automate the lifecycle of online infant and child behavioral studies, representing a key step towards enabling rapid, high-powered developmental research.