Studying language in its natural context is one of the new challenges for natural language processing as well as linguistics in general. Much work have been done in the perspective of spoken language processing, even though the issues in this domain remains largely unsolved (disfluencies, ill-formedness, etc.). But the problem becomes even harder when trying to take into account all the aspects of natural communication, including pragmatics and gestures. In this case, we need to describe many different sources of information (let's call them linguistic domains) coming from the signal (prosody, phonetics), the transcription (morphology, syntax, lexical semantics), as well as the behavior of the conversation partners (gestures, attitudes, etc), the contextual background, etc. Taking into account such a rich environment means that language is seen in its multimodal dimension which necessitates a full description of each verbal or non-verbal domain as well as their interaction. Such a description is obviously a pre-requisite before the elaboration of a multimodal theory of language. It is also a basis for the development of parsing tools or annotating devices. Both goals rely on the availability of annotated resources, providing information on all the different domains and modalities. This is the goal of the project described here, that led to the development of a large annotated multimodal corpus called CID (Corpus of Interactional Data).In this article, the context of multimodality and the issues we are faced with when building multimodal resources will first be presented. The second part, we will present more precisely the organization of the project during which the CID corpus was built. The rest of the paper will describe the solutions we propose to what we consider as the main issues for multimodal annotation, namely the annotation scheme, the alignment between the different domains and the interoperability of the different sources of information. Multimodal Interaction and its AnnotationOur work aims at collecting data in natural situations, with audio and video recordings of human interaction, focusing then on language and gestures, to the exclusion of the other kinds of modalities be they natural (smell, touch) or artificial (related to human-machine interaction for example). More specifically, what we are interested in when studying such domains is the interaction that exists between the different sources of information. Indeed, we think that (1) meaning comes from the interplay of different dimensions such as prosody, lexicon, gestures,
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.