A key question for understanding speech evolution is whether or not the vocalizations of our closest living relatives-nonhuman primates-represent the precursors to speech. Some believe that primate vocalizations are not volitional but are instead inextricably linked to internal states like arousal and thus bear little resemblance to human speech. Others disagree and believe that since many primates can use their vocalizations strategically, this demonstrates a degree of voluntary vocal control. In the current study, we present a behavioral paradigm that reliably elicits different types of affiliative vocalizations from marmoset monkeys while measuring their heart rate fluctuations using noninvasive electromyography. By modulating both the physical distance between marmosets and the sensory information available to them, we find that arousal levels are linked, but not inextricably, to vocal production. Different arousal levels are, generally, associated with changes in vocal acoustics and the drive to produce different call types. However, in contexts where marmosets are interacting, the production of these different call types is also affected by extrinsic factors such as the timing of a conspecific's vocalization. These findings suggest that variability in vocal output as a function of context might reflect trade-offs between the drive to perpetuate vocal contact and conserving energy.