Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua 2016
DOI: 10.18653/v1/n16-1181
|View full text |Cite
|
Sign up to set email alerts
|

Learning to Compose Neural Networks for Question Answering

Abstract: We describe a question answering model that applies to both images and structured knowledge bases. The model uses natural language strings to automatically assemble neural networks from a collection of composable modules. Parameters for these modules are learned jointly with network-assembly parameters via reinforcement learning, with only (world, question, answer) triples as supervision. Our approach, which we term a dynamic neural module network, achieves state-of-theart results on benchmark datasets in both… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
364
0

Year Published

2016
2016
2021
2021

Publication Types

Select...
4
3
3

Relationship

1
9

Authors

Journals

citations
Cited by 404 publications
(365 citation statements)
references
References 25 publications
1
364
0
Order By: Relevance
“…NSM is similar to Neural Programmer (Neelakantan et al, 2015) and Dynamic Neural Module Network (Andreas et al, 2016) in that they all solve the problem of semantic parsing from structured data, and generate programs using similar semantics. The main difference between these approaches is how an intermediate result (the memory) is represented.…”
Section: Related Workmentioning
confidence: 99%
“…NSM is similar to Neural Programmer (Neelakantan et al, 2015) and Dynamic Neural Module Network (Andreas et al, 2016) in that they all solve the problem of semantic parsing from structured data, and generate programs using similar semantics. The main difference between these approaches is how an intermediate result (the memory) is represented.…”
Section: Related Workmentioning
confidence: 99%
“…Components of this pipeline can be trained independently (Sections 5.2 and 5.3) or jointly as a single End-to-End model (Section 5.4). This division of labor also allows for differing amounts of human intervention both during training and in the interpretation of actions and bears some resemblance to (Andreas et al, 2016). Specifically, we will first present results where the model predicts a fixed semantic interpretation of actions which are easily human interpretable (Encoder + Representation).…”
Section: Model Architecturementioning
confidence: 99%
“…In particular, different VQA models have focused on how they integrate the question and image inputs in the model. Various VQA techniques were reviewed in [15], where the recent approaches were found to be, − using Bayesian models to exploit the underlying relationships between question-image-answer feature distributions [28], − using the question to break the VQA task into a sequence of modular sub-problems [2]. For example, the question "what is the major class in the image?"…”
Section: Related Workmentioning
confidence: 99%