2018
DOI: 10.48550/arxiv.1809.02836
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Context-Free Transductions with Neural Stacks

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
9
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(10 citation statements)
references
References 0 publications
1
9
0
Order By: Relevance
“…For these reasons, we think of the D >1 languages as expressing the core of what it means to be a contextfree language with hierarchical structure, even if it is not itself a universal CFL. This property of the Dyck languages accounts for the heavy focus on the them in prior work (Deleu and Dureau, 2016;Bernardy, 2018;Sennhauser and Berwick, 2018;Skachkova et al, 2018;Hao et al, 2018;Zaremba et al, 2016;Suzgun et al, 2019a;Yu et al, 2019;Hahn, 2019) as well as in this work. It would thus be notable for finite precision neural networks to learn languages, like the D >1 languages and other languages requiring a stack, if we want these neural architectures to be able to manifest hierarchical structures.…”
Section: Introductionsupporting
confidence: 63%
See 3 more Smart Citations
“…For these reasons, we think of the D >1 languages as expressing the core of what it means to be a contextfree language with hierarchical structure, even if it is not itself a universal CFL. This property of the Dyck languages accounts for the heavy focus on the them in prior work (Deleu and Dureau, 2016;Bernardy, 2018;Sennhauser and Berwick, 2018;Skachkova et al, 2018;Hao et al, 2018;Zaremba et al, 2016;Suzgun et al, 2019a;Yu et al, 2019;Hahn, 2019) as well as in this work. It would thus be notable for finite precision neural networks to learn languages, like the D >1 languages and other languages requiring a stack, if we want these neural architectures to be able to manifest hierarchical structures.…”
Section: Introductionsupporting
confidence: 63%
“…In contrast to the models in Joulin and Mikolov (2015); Grefenstette et al (2015); Hao et al (2018); Yu et al (2019), our architectures are economical: Unless otherwise stated, the models are all single-layer networks with 8 hidden units. In all the experiments, the entries of the memory were set to be one-dimensional, while the size of the memory in the Baby-NTMs was fixed to 104 (since the length of the longest sequence in all the tasks was 100).…”
Section: Training Detailsmentioning
confidence: 93%
See 2 more Smart Citations
“…Finally, observing the defects of both LSTM and Transformer in learning CFG and algorithmic tasks, many works propose to use external memory to enhance the LSTM model (Joulin and Mikolov 2015;Das, Giles, and Sun 1992;Suzgun et al 2019b), introduce recurrence in the Transformer network (Dehghani et al 2018), and design specialized architectures (Graves, Wayne, and Danihelka 2014;Hao et al 2018;Sukhbaatar et al 2015;Stogin et al 2020). Though it's commonly believed that LSTM with finite memory, i.e.…”
Section: Related Workmentioning
confidence: 99%