Visual perception, language and gesture: A model for their understanding in multimodal dialogue systems

Landragin, Frédéric

doi:10.1016/j.sigpro.2006.02.046

Cited by 22 publications

(16 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…While up to 57 potential referents were on the board at any given time, speakers and addressees only considered those that had been mentioned recently, that were relevant to the task, and were in close physical proximity to the last mentioned object. Similar task-based constraints have been found to constrain referring in other task--related conversations (Beun & Cremers, 1998; also see Landragin, 2006), suggesting these effects are not limited to the particular task used in this study. Lexical competition during spoken word recognition can be attenuated by other constraints as well, including semantic information (Barr, 2008), talker preferences (e.g., if one talker always says candy, and a different talker always says candle, Creel, et al, 2008), and structural priming of verbs (Thothathiri & Snedeker, 2008).…”

Section: Actions and Gesturesupporting

confidence: 80%

Visual environment and interlocutors in situated dialogue

Brown‐Schmidt¹

2016

Advances in Consciousness Research

View full text Add to dashboard Cite

Face-to-face conversation is often considered the most basic form of language use, as it was likely a dominant mode of communication as languages evolved, it is often the primary form of language input during children's language acquisition, and it is a dominant mode of adult communication today. Conversational language differs in important ways from the language traditionally studied in psycholinguistics; thus, characterizing language processing in conversation is essential if models of language understanding are to extend to this most basic form of language use. This chapter will examine key features of language comprehension in conversation, and will highlight the role of the visual environment in establishing joint domains of reference. Unlike in non-interactive settings, in conversation language is jointly created by conversational partners who hold different, but partially overlapping representations of the relevant context. Understanding if and how partners appreciate their partner's perspective has emerged as a central question in this domain.

show abstract

Section: Actions and Gesturesupporting

confidence: 80%

Visual environment and interlocutors in situated dialogue

Brown‐Schmidt¹

2016

Advances in Consciousness Research

View full text Add to dashboard Cite

show abstract

“…In such settings, the referential domain, or the domain of interpretation for the referring expression, can be easily identified as the set of objects shown in the display. Identification of the referential domain in unscripted conversation is likely to be significantly more complex [41], particularly when the set of potential referents is large, when the interlocutors have different perspectives on the referential domain, and when the potential discourse referents have different affordances. For example, Chambers and colleagues [42] tested whether the affordances of a potential referent and their consistency with a spatial preposition guided listeners' interpretation of instructions.…”

Section: Referential Domainsmentioning

confidence: 99%

Experimental Approaches to Referential Domains and the On-Line Processing of Referring Expressions in Unscripted Conversation

Brown‐Schmidt

2011

Information

View full text Add to dashboard Cite

This article describes research investigating the on-line processing of language in unscripted conversational settings. In particular, we focus on the process of formulating and interpreting definite referring expressions. Within this domain we present results of two eye-tracking experiments addressing the problem of how speakers interrogate the referential domain in preparation to speak, how they select an appropriate expression for a given referent, and how addressees interpret these expressions. We aim to demonstrate that it is possible, and indeed fruitful, to examine unscripted, conversational language using modified experimental designs and standard hypothesis testing procedures

show abstract

“…There are several areas of research that are relevant to our work, the first one being the vast literature on multimodality -we will just focus on multimodal referring expressions in this paper. As is well known (Sinclair, 1992;Kehler, 2000;Goldin-Meadow, 2005;Landragin, 2006;Navarretta, 2011), in natural dialogue, the antecedents of linguistic referring expressions are often introduced via gestures; for example in our environment, the user can point to a street intersection on a map yet never have mentioned it earlier. Crucially from a computational point of view, including hand gestures information improves the performance of the reference resolution module (Eisenstein and Davis, 2006;Baldwin et al, 2009).…”

Section: Related Workmentioning

confidence: 96%

Multimodal Coreference Resolution for Exploratory Data Visualization Dialogue: Context-Based Annotation and Gesture Identification

Kumar¹,

Aurisano²,

Eugenio³

et al. 2017

SEMDIAL 2017 (SaarDial) Workshop on the Semantics and Pragmatics of Dialogue

View full text Add to dashboard Cite

The goals of our work are twofold: gain insight into how humans interact with complex data and visualizations thereof in order to make discoveries; and use our findings to develop a dialogue system for exploring data visualizations. Crucial to both goals is understanding and modeling of multimodal referential expressions, in particular those that include deictic gestures. In this paper, we discuss how context information affects the interpretation of requests and their attendant referring expressions in our data. To this end, we have annotated our multimodal dialogue corpus for context and both utterance and gesture information; we have analyzed whether a gesture co-occurs with a specific request or with the context surrounding the request; we have started addressing multimodal co-reference resolution by using Kinect to detect deictic gestures; and we have started identifying themes found in the annotated context, especially in what follows the request.

show abstract

Visual perception, language and gesture: A model for their understanding in multimodal dialogue systems

Cited by 22 publications

References 9 publications

Visual environment and interlocutors in situated dialogue

Visual environment and interlocutors in situated dialogue

Experimental Approaches to Referential Domains and the On-Line Processing of Referring Expressions in Unscripted Conversation

Multimodal Coreference Resolution for Exploratory Data Visualization Dialogue: Context-Based Annotation and Gesture Identification

Contact Info

Product

Resources

About