Assessing infant carrying and holding (C/H), or physical infant-caregiver interaction, is important for a wide range of contexts in development research. An automated detection and quantification of infant C/H is particularly needed in long term at-home studies where development of infants’ neurobehavior is measured using wearable devices. Here, we first developed a phenomenological categorization for physical infant-caregiver interactions to support five different definitions of C/H behaviors. Then, we trained and assessed deep learning-based classifiers for their automatic detection from multi-sensor wearable recordings that were originally used for mobile assessment of infants’ motor development. Our results show that an automated C/H detection is feasible at few-second temporal accuracy. With the best C/H definition, the automated detector shows 96% accuracy and 0.56 kappa, which is slightly less than the video-based inter-rater agreement between trained human experts (98% accuracy, 0.77 kappa). The classifier performance varies with C/H definition reflecting the extent to which infants’ movements are present in each C/H variant. A systematic benchmarking experiment shows that the widely used actigraphy-based method ignores the normally occurring C/H behaviors. Finally, we show proof-of-concept for the utility of the novel classifier in studying C/H behavior across infant development. Particularly, we show that matching the C/H detections to individuals’ gross motor ability discloses novel insights to infant-parent interaction.