Róbert Csordás scite author profile

Róbert Csordás

5Publications

67Citation Statements Received

160Citation Statements Given

How they've been cited

How they cite others

102

160

Affiliations

Budapest University of Technology and Economics, Institute for Computer Science and Control

Publications

Order By: Most citations

The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers

Csordás¹,

Irie²,

Schmidhuber³

2021

View full text Add to dashboard Cite

Recently, many datasets have been proposed to test the systematic generalization ability of neural networks. The companion baseline Transformers, typically trained with default hyper-parameters from standard tasks, are shown to fail dramatically. Here we demonstrate that by revisiting model configurations as basic as scaling of embeddings, early stopping, relative positional embedding, and Universal Transformer variants, we can drastically improve the performance of Transformers on systematic generalization. We report improvements on five popular datasets: SCAN, CFQ, PCFG, COGS, and Mathematics dataset. Our models improve accuracy from 50% to 85% on the PCFG productivity split, and from 35% to 81% on COGS. On SCAN, relative positional embedding largely mitigates the EOS decision problem (Newman et al., 2020), yielding 100% accuracy on the length split with a cutoff at 26. Importantly, performance differences between these models are typically invisible on the IID data split. This calls for proper generalization validation sets for developing neural networks that generalize systematically. We publicly release the code to reproduce our results 1 .

show abstract

The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers

Csordás¹,

Irie²,

Schmidhuber³

2021

Preprint

View full text Add to dashboard Cite

show abstract

Are Neural Nets Modular? Inspecting Functional Modularity Through Differentiable Weight Masks

Csordás¹,

Steenkiste²,

Schmidhuber³

2020

Preprint

View full text Add to dashboard Cite

A Modern Self-Referential Weight Matrix That Learns to Modify Itself

Irie¹,

Schlag²,

Csordás³

et al. 2022

Preprint

View full text Add to dashboard Cite

The weight matrix (WM) of a neural network (NN) is its program. The programs of many traditional NNs are learned through gradient descent in some error function, then remain fixed. The WM of a self-referential NN, however, can keep rapidly modifying all of itself during runtime. In principle, such NNs can meta-learn to learn, and metameta-learn to meta-learn to learn, and so on, in the sense of recursive self-improvement. While NN architectures potentially capable of implementing such behavior have been proposed since the '90s, there have been few if any practical studies.Here we revisit such NNs, building upon recent successes of fast weight programmers and closely related linear Transformers. We propose a scalable self-referential WM (SRWM) that uses outer products and the delta update rule to modify itself. We evaluate our SRWM in supervised few-shot learning and in multi-task reinforcement learning with procedurally generated game environments. Our experiments demonstrate both practical applicability and competitive performance of the proposed SRWM. Our code is public † .

show abstract

Going Beyond Linear Transformers with Recurrent Fast Weight Programmers

Irie¹,

Schlag²,

Csordás³

et al. 2021

Preprint

View full text Add to dashboard Cite

Transformers with linearised attention ("linear Transformers") have demonstrated the practical scalability and effectiveness of outer product-based Fast Weight Programmers (FWPs) from the '90s. However, the original FWP formulation is more general than the one of linear Transformers: a slow neural network (NN) continually reprograms the weights of a fast NN with arbitrary NN architectures. In existing linear Transformers, both NNs are feedforward and consist of a single layer. Here we explore new variations by adding recurrence to the slow and fast nets. We evaluate our novel recurrent FWPs (RFWPs) on two synthetic algorithmic tasks (code execution and sequential ListOps), Wikitext-103 language models, and on the Atari 2600 2D game environment. Our models exhibit properties of Transformers and RNNs. In the reinforcement learning setting, we report large improvements over LSTM in several Atari games. Our code is public. 1 * Equal contribution. 1 https://github.com/IDSIA/recurrent-fwp Preprint. Under review.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.