Grounded Semantic Composition for Visual Scenes

Gorniak, Peter; Roy, Deb

doi:10.1613/jair.1327

Cited by 96 publications

(85 citation statements)

References 31 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In some cases, for example Kievit et al (2001), Salmon-Alt and Romary (2001), Landragin and Romary (2003), Kelleher et al (2005), the initial set of candidates referents is restricted to a sub-set of the context based on preferences with respect to the mode of interpretation relative to the form of reference. In other frameworks, for example Gorniak and Roy (2004), candidate referents are incrementally excluded from consideration as the resolution process progresses due to the sequential manner that the semantics of the terms within the reference are processed.…”

Section: S1mentioning

confidence: 99%

“…McKevitt (1996) provides an excellent collection of papers on early systems. Recent systems that focus on multimodal reference resolution include: Kievit et al (2001), Salmon-Alt and Romary (2001), Landragin and Romary (2003), Gorniak and Roy (2004) and Kelleher et al (2005). Kievit et al (2001) define separate resolution strategies for each form of referring expression.…”

Section: Related Workmentioning

confidence: 99%

“…As a result, the system cannot recognise situations where a reference may be ambiguous between two entities in different sub-contexts, and, consequently, it may resolve a reference incorrectly rather than initiate a clarification process. Gorniak and Roy (2004) focus on the resolution of references containing spatial descriptions. They propose a feed-forward filtering process to reference resolution.…”

Section: Related Workmentioning

confidence: 99%

“…In these frameworks the resolution process involves: (1) the construction of an underspecified reference domain, using templates associated with the form of the reference given; (2) the unification of this underspecified domain with a suitable reference domain within the context model; (3) the selection of one of the elements within the unified reference domain to function as the referent. However, similar to the frameworks proposed in Kievit et al (2001) and Gorniak and Roy (2004), there is the potential for these frameworks to overcommit to a particular subset of the context during the resolution process. As the resolution process occurs within a sub-context, whose selection is at least partially driven by the form of the reference being interpreted, if the wrong reference domain is selected the intended target object and/or plausible distractor referents, that may indicate the need for reference clarification, may be excluded from consideration.…”

Section: Related Workmentioning

confidence: 99%

See 3 more Smart Citations

Attention driven reference resolution in multimodal contexts

Kelleher¹

2007

Artif Intell Rev

View full text Add to dashboard Cite

Abstract. In recent years a a number of psycholinguistic experiments have pointed to the interaction between language and vision. In particular, the interaction between visual attention and linguistic reference. In parallel with this, several theories of discourse have attempted to provide an account of the relationship between types of referential expressions on the one hand and the degree of mental activation on the other. Building on both of these traditions, this paper describes an attention based approach to visually situated reference resolution. The framework uses the relationship between referential form and preferred mode of interpretation as a basis for a weighted integration of linguistic and visual attention scores for each entity in the multimodal context. The resulting integrated attention scores are then used to rank the candidate referents during the resolution process, with the candidate scoring the highest selected as the referent. One advantage of this approach is that the resolution process occurs within the full multimodal context, in so far as the referent is selected from a full list of the objects in the multimodal context. As a result situations where the intended target of the reference is erroneously excluded, due to an individual assumption within the resolution process, are avoided. Moreover, the system can recognise situations where attention cues from different modalities make a reference potentially ambiguous.

show abstract

Section: S1mentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

See 2 more Smart Citations

Attention driven reference resolution in multimodal contexts

Kelleher¹

2007

Artif Intell Rev

View full text Add to dashboard Cite

show abstract

“…Behaviour-based approaches are "data-driven" because the result of the learning process is determined largely by low-level features in the environment and less by any pre-defined knowledge in an ontology. Recent work in concept formation involving symbol grounding includes [12,13].…”

Section: Agent Architectures To Support Groundingmentioning

confidence: 99%

Intelligent Management of Data Driven Simulations to Support Model Building in the Social Sciences

Kennedy

Theodoropoulos

2006

Computational Science – ICCS 2006

View full text Add to dashboard Cite

Abstract. Artificial intelligence (AI) can contribute to the management of a data driven simulation system, in particular with regard to adaptive selection of data and refinement of the model on which the simulation is based. We consider two different classes of intelligent agent that can control a data driven simulation: (a) an autonomous agent using internal simulation to test and refine a model of its environment and (b) an assistant agent managing a data-driven simulation to help humans understand a complex system (assisted model-building). In the first case the agent is situated in its environment and can use its own sensors to explore the data sources. In the second case, the agent has much less independent access to data and may have limited capability to refine the model on which the simulation is based. This is particularly true if the data contains subjective statements about the human view of the world, such as in the social sciences.For complex systems involving human actors, we propose an architecture in which assistant agents cooperate with autonomous agents to build a more complete and reliable picture of the observed system.

show abstract

Artificial Intelligence

2004

Encyclopedic Dictionary of Genetics, Genomics and Proteomics

View full text Add to dashboard Cite

A theoretical framework for grounding language is introduced that provides a computational path from sensing and motor action to words and speech acts. The approach combines concepts from semiotics and schema theory to develop a holistic approach to linguistic meaning. Schemas serve as structured beliefs that are grounded in an agent's physical environment through a causal-predictive cycle of action and perception. Words and basic speech acts are interpreted in terms of grounded schemas. The framework reflects lessons learned from implementations of several language processing robots. It provides a basis for the analysis and design of situated, multimodal communication systems that straddle symbolic and non-symbolic realms.

show abstract

Grounded Semantic Composition for Visual Scenes

Cited by 96 publications

References 31 publications

Attention driven reference resolution in multimodal contexts

Attention driven reference resolution in multimodal contexts

Intelligent Management of Data Driven Simulations to Support Model Building in the Social Sciences

Artificial Intelligence

Contact Info

Product

Resources

About