This paper presents a collection of tools (and adaptors for existing tools) that we have recently developed, which support acquisition, annotation and analysis of multimodal corpora. For acquisition, an extensible architecture is offered that integrates various sensors, based on existing connectors (e.g. for motion capturing via VICON, or ART) and on connectors we contribute (for motion tracking via Microsoft Kinect as well as eye tracking via Seeingmachines FaceLAB 5). The architecture provides live visualisation of the multimodal data in a unified virtual reality (VR) view (using Fraunhofer Instant Reality) for control during recordings, and enables recording of synchronised streams. For annotation, we provide a connection between the annotation tool ELAN (MPI Nijmegen) and the VR visualisation. For analysis, we provide routines in the programming language Python that read in and manipulate (aggregate, transform, plot, analyse) the sensor data, as well as text annotation formats (Praat TextGrids). Use of this toolset in multimodal studies proved to be efficient and effective, as we discuss. We make the collection available as open source for use by other researchers.
That speakers take turns in interaction is a fundamental fact across languages and speaker communities. How this taking of turns is organised is less clearly established. We have looked at interactions recorded in the field using the same task, in a set of three genetically and regionally diverse languages: Georgian, Cabécar, and Fongbe. As in previous studies, we find evidence for avoidance of gaps and overlaps in floor transitions in all languages, but also find contrasting differences between them on these features. Further, we observe that interlocutors align on these temporal features in all three languages. (We show this by correlating speaker averages of temporal features, which has been done before, and further ground it by ruling out potential alternative explanations, which is novel and a minor methodological contribution.) The universality of smooth turn-taking and alignment despite potentially relevant grammatical differences suggests that the different resources that each of these languages make available are nevertheless used to achieve the same effects. This finding has potential consequences both from a theoretical point of view as well as for modeling such phenomena in conversational agents.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.