Proceedings of the 7th International Conference on Multimodal Interfaces 2005
DOI: 10.1145/1088463.1088489
|View full text |Cite
|
Sign up to set email alerts
|

Probabilistic grounding of situated speech using plan recognition and reference resolution

Abstract: Situated, spontaneous speech may be ambiguous along acoustic, lexical, grammatical and semantic dimensions. To understand such a seemingly difficult signal, we propose to model the ambiguity inherent in acoustic signals and in lexical and grammatical choices using compact, probabilistic representations of multiple hypotheses. To resolve semantic ambiguities we propose a situation model that captures aspects of the physical context of an utterance as well as the speaker's intentions, in our case represented by … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
15
0

Year Published

2006
2006
2017
2017

Publication Types

Select...
5
3
2

Relationship

1
9

Authors

Journals

citations
Cited by 35 publications
(15 citation statements)
references
References 17 publications
0
15
0
Order By: Relevance
“…A related line of work focuses on grounding referring expressions to referents in 3D worlds with simple colored geometric shapes (Gorniak and Roy, 2004;Gorniak and Roy, 2005). More recent work grounds text to object attributes such as color and shape in images (Matuszek et al, 2012;Krishnamurthy and Kollar, 2013).…”
Section: Related Tasksmentioning
confidence: 99%
“…A related line of work focuses on grounding referring expressions to referents in 3D worlds with simple colored geometric shapes (Gorniak and Roy, 2004;Gorniak and Roy, 2005). More recent work grounds text to object attributes such as color and shape in images (Matuszek et al, 2012;Krishnamurthy and Kollar, 2013).…”
Section: Related Tasksmentioning
confidence: 99%
“…Quickset (Cohen, et al, 1997), FUSS (Gorniak & Roy, 2005) and (Tue Vo & Wood, 1996) are examples of such direct manipulation systems in which speech is combined with a pen based interface. For virtual environments some examples are (McGlashan, 1995;Muller, et al, 1998;Cernak & Sannier, 2002;Kaiser, et al, 2003), they are usually combined with some form of direct manipulation (e.g.…”
Section: Related Workmentioning
confidence: 99%
“…Multiple Interpretations The particular implementation discussed here uses the best interpretation of an utterance exclusively. In previous work we have shown ways to consider multiple weighted interpretations simultaneously by probabilistically mixing the linguistic elements from the language parser with the affordances produced by the structural grammar (Gorniak & Roy, 2005a). It would clearly be beneficial to adapt those methods to the system described here to consider multiple word and constituent meanings and their interpretations simultaneously.…”
Section: Action Markersmentioning
confidence: 99%