2020
DOI: 10.48550/arxiv.2001.10913
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

MEMO: A Deep Network for Flexible Combination of Episodic Memories

Abstract: Recent research developing neural network architectures with external memory have often used the benchmark bAbI question and answering dataset which provides a challenging number of tasks requiring reasoning. Here we employed a classic associative inference task from the memory-based reasoning neuroscience literature in order to more carefully probe the reasoning capacity of existing memoryaugmented architectures. This task is thought to capture the essence of reasoningthe appreciation of distant relationships… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 8 publications
(10 citation statements)
references
References 25 publications
0
10
0
Order By: Relevance
“…Transformers have been shown to outperform RNNs in many tasks in both NLP and computer vision. In particular, their ability to directly access historical states and to learn complex interactions among them, has been shown to excel in tasks that require complex long-term temporal dependencies such as memory-based reasoning (Ritter et al, 2020;Banino et al, 2020). Furthermore, they have been shown to be effective for temporal generation in both language and visual domains.…”
Section: Transdreamermentioning
confidence: 99%
See 1 more Smart Citation
“…Transformers have been shown to outperform RNNs in many tasks in both NLP and computer vision. In particular, their ability to directly access historical states and to learn complex interactions among them, has been shown to excel in tasks that require complex long-term temporal dependencies such as memory-based reasoning (Ritter et al, 2020;Banino et al, 2020). Furthermore, they have been shown to be effective for temporal generation in both language and visual domains.…”
Section: Transdreamermentioning
confidence: 99%
“…To deal with partial observability (Kaelbling et al, 1998), the dynamics models in MBRL have been implemented using recurrent neural networks (RNNs) (Hafner et al, 2019;Schrittwieser et al, 2020;Kaiser et al, 2019). However, Transformers (Vaswani et al, 2017;Dai et al, 2019) have shown to be more effective than RNNs in many domains requiring long-term dependency and direct access to memory for a form of memory-based reasoning (Ritter et al, 2020;Banino et al, 2020). Also, it has been shown that training complex policy networks based on transformers using only rewards is difficult (Parisotto et al, 2020), so learning a transformer-based world model where the training signal is more diverse may facilitate learning.…”
Section: Introductionmentioning
confidence: 99%
“…In spirit, methods for Program induction tend to be closer to neural networks than to symbolic computing. For instance, architectures such as the Neural Turing Machine [197,198], the Differential Neural Computer [198,199], the Neural programmer [200], Neural programmer-interpreters [201,195], Neural Program Lattices [202], the Neural State Machine [200], and most recently MEMO [203] extend neural networks with external memory, and can infer simple algorithms such as adding numbers, copying, sorting and path finding. [165] illustrates the use of DeepProbLog to solve three program induction tasks and compare their results to Differentiable Forth (∂4) [204].…”
Section: Program Synthesismentioning
confidence: 99%
“…For example, Lample et al (2019) proposed to solve the under-fitting problem of Transformer by introducing a product-key layer that is similar to a memory module. Banino et al (2020) proposed MEMO, an adaptive memory to reason over long-distance texts. Compared to these studies, the approach proposed in this paper focuses on leveraging memory for decoding rather than encoding, and presents a relational memory to learn from previous generation processes as well as patterns for long text generation.…”
Section: Related Workmentioning
confidence: 99%