As infants interact with the object world, they generate rich information about object properties and functions. Much of infant learning unfolds in the presence of caregivers, who talk about and act on the objects of infant play. Does mother joint engagement correspond to real-time changes in the complexity and duration of infant object interactions? We observed 38 mothers and their first-born infants (cross-sectional, 13, 18, and 23 months) during 2 h of everyday activity as infants freely navigated their home environments. Behavioral coding explored thousands of infant object interactions within and outside mother joint engagement. Object interactions involving exclusively simple play were shorter than complex play bouts. Critically, mothers' multimodal input (i.e., touching/gesturing toward and talking about the focal object) corresponded with more complex and longer play bouts than when mothers provided no input. Bouts involving complex play and multimodal input lasted 7.5 times longer than simple play bouts absent mother input. Moreover, "action-orienting talk" (e.g., "Twist it", "Feed dolly"), rather than talk per se, corresponded with longer bout duration and complexity. Notably, the association between joint engagement and play duration was not a function of mothers having more time to join. Analyses that eliminated short infant bouts and considered the timing of mothers' behaviors confirmed that mother input "extended" the duration of play bouts. As infants actively explore their environments, their object interactions change moment to moment in the presence of mothers' multimodal engagement.