2004
DOI: 10.1613/jair.1327
|View full text |Cite
|
Sign up to set email alerts
|

Grounded Semantic Composition for Visual Scenes

Abstract: We present a visually-grounded language understanding model based on a study of how people verbally describe objects in scenes. The emphasis of the model is on the combination of individual word meanings to produce meanings for complex referring expressions. The model has been implemented, and it is able to understand a broad range of spatial referring expressions. We describe our implementation of word level visually-grounded semantics and their embedding in a compositional parsing framework. The implemented … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
85
0

Year Published

2004
2004
2024
2024

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 96 publications
(85 citation statements)
references
References 31 publications
0
85
0
Order By: Relevance
“…In some cases, for example Kievit et al (2001), Salmon-Alt and Romary (2001), Landragin and Romary (2003), Kelleher et al (2005), the initial set of candidates referents is restricted to a sub-set of the context based on preferences with respect to the mode of interpretation relative to the form of reference. In other frameworks, for example Gorniak and Roy (2004), candidate referents are incrementally excluded from consideration as the resolution process progresses due to the sequential manner that the semantics of the terms within the reference are processed.…”
Section: S1mentioning
confidence: 99%
See 3 more Smart Citations
“…In some cases, for example Kievit et al (2001), Salmon-Alt and Romary (2001), Landragin and Romary (2003), Kelleher et al (2005), the initial set of candidates referents is restricted to a sub-set of the context based on preferences with respect to the mode of interpretation relative to the form of reference. In other frameworks, for example Gorniak and Roy (2004), candidate referents are incrementally excluded from consideration as the resolution process progresses due to the sequential manner that the semantics of the terms within the reference are processed.…”
Section: S1mentioning
confidence: 99%
“…McKevitt (1996) provides an excellent collection of papers on early systems. Recent systems that focus on multimodal reference resolution include: Kievit et al (2001), Salmon-Alt and Romary (2001), Landragin and Romary (2003), Gorniak and Roy (2004) and Kelleher et al (2005). Kievit et al (2001) define separate resolution strategies for each form of referring expression.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Behaviour-based approaches are "data-driven" because the result of the learning process is determined largely by low-level features in the environment and less by any pre-defined knowledge in an ontology. Recent work in concept formation involving symbol grounding includes [12,13].…”
Section: Agent Architectures To Support Groundingmentioning
confidence: 99%