2006
DOI: 10.1613/jair.1936
|View full text |Cite
|
Sign up to set email alerts
|

Cognitive Principles in Robust Multimodal Interpretation

Abstract: Multimodal conversational interfaces provide a natural means for users to communicate with computer systems through multiple modalities such as speech and gesture. To build effective multimodal interfaces, automated interpretation of user multimodal inputs is important. Inspired by the previous investigation on cognitive status in multimodal human machine interaction, we have developed a greedy algorithm for interpreting user referring expressions (i.e., multimodal reference resolution). This algorithm incorpo… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
6
0

Year Published

2008
2008
2020
2020

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 8 publications
(7 citation statements)
references
References 40 publications
0
6
0
Order By: Relevance
“…Building on this work, Chai, Hong, and Zhou (2004) proposed a probabilistic graph-matching algorithm for resolving referring expressions that are complex (involving multiple target referents) and ambiguous (involving gestures that could indicate multiple candidate referents) in multimodal user interfaces. Because this algorithm had high computational complexity, Chai, Prasov, and Qu (2006) demonstrated how the algorithm's performance could be improved using a greedy algorithm based on the theories of Conversational Implicature (Dale & Reiter, 1995;Grice, 1975) and the GH. Chai et al combine these theories to create a reduced hierarchy: Gesture ⊆ Focus ⊆ Visible ⊆ Others, where Focus combines the "in focus" and "activated" tiers of the GH, and Visible combines its "familiar" and "uniquely identifiable" tiers.…”
Section: Related Workmentioning
confidence: 99%
“…Building on this work, Chai, Hong, and Zhou (2004) proposed a probabilistic graph-matching algorithm for resolving referring expressions that are complex (involving multiple target referents) and ambiguous (involving gestures that could indicate multiple candidate referents) in multimodal user interfaces. Because this algorithm had high computational complexity, Chai, Prasov, and Qu (2006) demonstrated how the algorithm's performance could be improved using a greedy algorithm based on the theories of Conversational Implicature (Dale & Reiter, 1995;Grice, 1975) and the GH. Chai et al combine these theories to create a reduced hierarchy: Gesture ⊆ Focus ⊆ Visible ⊆ Others, where Focus combines the "in focus" and "activated" tiers of the GH, and Visible combines its "familiar" and "uniquely identifiable" tiers.…”
Section: Related Workmentioning
confidence: 99%
“…There has been considerable work in reference resolution, more generally, and in applying various theories from cognitive science in order to use pragmatics (Schüller 2014;Richard-Bollans, Gomez Alvarez, and Cohn 2017;Kehler 2000;Chai, Prasov, and Qu 2006;Van Deemter 2016). Unfortunately, much of this work focuses on pragmatics as they pertain to processing effort and cognitive effect on the agent, and less so on situational aspects of the agent's surroundings.…”
Section: Related Workmentioning
confidence: 99%
“…In [4] the interpretation process is defined using an edit-based transducer combined with a finitestate-based interpreter, which directly works on lattice inputs. The "referent resolution approach", used in [5], is a further approach for multimodal interpretation. It finds the most proper referents, such as specific object or objects that are the models to which the users' input has to be matched, to the referring expressions given by the user's input.…”
Section: Related Workmentioning
confidence: 99%