In order to be able to answer a natural language question, a computational system needs three main capabilities. First, the system needs to be able to analyze the question into a structured query, revealing its component parts and how these are combined. Second, it needs to have access to relevant knowledge sources, such as databases, texts or images. Third, it needs to be able to execute the query on these knowledge sources. This paper focuses on the first capability, presenting a novel approach to semantically parsing questions expressed in natural language. The method makes use of a computational construction grammar model for mapping questions onto their executable semantic representations. We demonstrate and evaluate the methodology on the CLEVR visual question answering benchmark task. Our system achieves a 100% accuracy, effectively solving the language understanding part of the benchmark task. Additionally, we demonstrate how this solution can be embedded in a full visual question answering system, in which a question is answered by executing its semantic representation on an image. The main advantages of the approach include (i) its transparent and interpretable properties, (ii) its extensibility, and (iii) the fact that the method does not rely on any annotated training data.
This paper investigates the role of tutor feedback in language learning using computational models. We compare two dominant paradigms in language learning: interactive learning and cross-situational learning -which differ primarily in the role of social feedback such as gaze or pointing. We analyze the relationship between these two paradigms and propose a new mixed paradigm that combines the two paradigms and allows to test algorithms in experiments that combine no feedback and social feedback. To deal with mixed feedback experiments, we develop new algorithms and show how they perform with respect to traditional knn and prototype approaches.
Autonomous agents perceive the world through streams of continuous sensori-motor data. Yet, in order to reason and communicate about their environment, agents need to be able to distill meaningful concepts from their raw observations. Most current approaches that bridge between the continuous and symbolic domain are using deep learning techniques. While these approaches often achieve high levels of accuracy, they rely on large amounts of training data, and the resulting models lack transparency, generality, and adaptivity. In this paper, we introduce a novel methodology for grounded concept learning. In a tutor-learner scenario, the method allows an agent to construct a conceptual system in which meaningful concepts are formed by discriminative combinations of prototypical values on human-interpretable feature channels. We evaluate our approach on the CLEVR dataset, using features that are either simulated or extracted using computer vision techniques. Through a range of experiments, we show that our method allows for incremental learning, needs few data points, and that the resulting concepts are general enough to be applied to previously unseen objects and can be combined compositionally. These properties make the approach well-suited to be used in robotic agents as the module that maps from continuous sensory input to grounded, symbolic concepts that can then be used for higher-level reasoning tasks.
Constructionist approaches to language make use of form-meaning pairings, called constructions, to capture all linguistic knowledge that is necessary for comprehending and producing natural language expressions. Language processing consists then in combining the constructions of a grammar in such a way that they solve a given language comprehension or production problem. Finding such an adequate sequence of constructions constitutes a search problem that is combinatorial in nature and becomes intractable as grammars increase in size. In this paper, we introduce a neural methodology for learning heuristics that substantially optimise the search processes involved in constructional language processing. We validate the methodology in a case study for the CLEVR benchmark dataset. We show that our novel methodology outperforms state-of-the-art techniques in terms of size of the search space and time of computation, most markedly in the production direction. The results reported on in this paper have the potential to overcome the major efficiency obstacle that hinders current efforts in learning large-scale construction grammars, thereby contributing to the development of scalable constructional language processing systems.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.