Deep Reason: A Strong Baseline for Real-World Visual Reasoning

Wu, Chengdong; Zhou, Yanzhao; Li, Gen; Duan, Nan; Tang, Duyu; Wang, Xiaojie

doi:10.48550/arxiv.1905.10226

Cited by 1 publication

(1 citation statement)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…GQA Models: GQA was introduced in [17] for realworld visual reasoning. Simple monolithic networks [40], MAC netowrk [16], and language-conditioned graph neural networks [15,11] have been developed for this task. LXMERT [37], a large-scale pre-trained encoder, has also been tested on this dataset.…”

Section: Related Workmentioning

confidence: 99%

Meta Module Network for Compositional Visual Reasoning

Chen

Gan

et al. 2019

Preprint

View full text Add to dashboard Cite

There are two main lines of research on visual reasoning: neural module network (NMN) with explicit multi-hop reasoning through handcrafted neural modules, and monolithic network with implicit reasoning in the latent feature space. The former excels in interpretability and compositionality, while the latter usually achieves better performance due to model flexibility and parameter efficiency. In order to bridge the gap between the two and leverage the merits of both, we present Meta Module Network (MMN), a novel hybrid approach that can utilize a Meta Module to perform versatile functionalities, while preserving compositionality and interpretability through modularized design. The proposed model first parses an input question into a functional program through a Program Generator. Instead of handcrafting a task-specific network to represent each function similar to traditional NMN, we propose a Meta Module, which can read a recipe (function specifications) to dynamically instantiate the task-specific Instance Modules for compositional reasoning. To endow different instance modules with designated functionalities, we design a symbolic teacher which can execute against provided scene graphs to generate guidelines for the instantiated modules (student) to follow during training. Experiments conducted on the GQA benchmark demonstrates that MMN outperforms both NMN and monolithic network baselines, with good generalization ability to handle unseen functions. 1

show abstract