Learning Context-free Languages with Nondeterministic Stack RNNs

DuSell, Brian; Chiang, David

doi:10.18653/v1/2020.conll-1.41

Cited by 5 publications

(17 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We begin by discussing three previously proposed stack RNNs, each of which uses a different style of differentiable stack: stratification [2,22,5], superposition [11], and nondeterminism [3].…”

Section: Previous Stack Rnnsmentioning

confidence: 99%

“…Following DuSell and Chiang [3], we make minor changes to the original model definitions given by Grefenstette et al [5] and Joulin and Mikolov [11] to ensure that all three of these stack RNN models conform to the same controller-stack interface. This allows us to isolate differences in the style of stack data structure employed while keeping other parts of the network the same.…”

Section: Controller-stack Interfacementioning

confidence: 99%

“…Many interesting machine learning problems involve sequential data which contain hierarchical structures, such as modeling context-free languages [5,3], evaluating mathematical expressions [18,7], and modeling syntax in natural language [21]. However, recurrent neural networks (RNNs) have been shown to have difficulty learning to solve these tasks, or generalizing to held-out sequences, unless they have supervision or a hierarchical inductive bias [19,24,14].…”

Section: Introductionmentioning

confidence: 99%

“…To remedy this, some previous work has investigated the addition of differentiable stack data structures to RNNs [22,5,11,3]. Just as adding a stack to a finite state machine, which makes it a pushdown automaton (PDA), enables it to recognize context-free languages (CFLs), the hope is that adding stacks to RNNs will increase the range of problems on which they can be used effectively.…”

Section: Introductionmentioning

confidence: 99%

“…DuSell and Chiang [3] recently proposed a stack-based RNN called the Nondeterministic Stack RNN (NS-RNN) that outperformed other stack RNNs on a range of CFL language modeling tasks. This model's defining feature is that its external data structure is a nondeterministic PDA, allowing it to simulate an exponential number of sequences of stack operations in parallel.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Learning Hierarchical Structures with Differentiable Nondeterministic Stacks

DuSell¹,

Chiang²

2021

Preprint

Self Cite

View full text Add to dashboard Cite

Learning hierarchical structures in sequential data -from simple algorithmic patterns to natural language -in a reliable, generalizable way remains a challenging problem for neural language models. Past work has shown that recurrent neural networks (RNNs) struggle to generalize on held-out algorithmic or syntactic patterns without supervision or some inductive bias. To remedy this, many papers have explored augmenting RNNs with various differentiable stacks, by analogy with finite automata and pushdown automata. In this paper, we present a stack RNN model based on the recently proposed Nondeterministic Stack RNN (NS-RNN) that achieves lower cross-entropy than all previous stack RNNs on five context-free language modeling tasks (within 0.05 nats of the information-theoretic lower bound), including a task in which the NS-RNN previously failed to outperform a deterministic stack RNN baseline. Our model assigns arbitrary positive weights instead of probabilities to stack actions, and we provide an analysis of why this improves training. We also propose a restricted version of the NS-RNN that makes it practical to use for language modeling on natural language and present results on the Penn Treebank corpus.Preprint. Under review.

show abstract

Section: Previous Stack Rnnsmentioning

confidence: 99%

Section: Controller-stack Interfacementioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Learning Hierarchical Structures with Differentiable Nondeterministic Stacks

DuSell¹,

Chiang²

2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

Dance preservation archives

Dusell¹

View full text Add to dashboard Cite

Human language is full of compositional syntactic structures, and although neural networks have contributed to groundbreaking improvements in computer systems that process language, widely-used neural network architectures still exhibit limitations in their ability to process syntax. To address this issue, prior work has proposed adding stack data structures to neural networks, drawing inspiration from theoretical connections between syntax and stacks. However, these methods employ deterministic stacks that are designed to track one parse at a time, whereas syntactic ambiguity, which requires a nondeterministic stack to parse, is extremely common in language. In this dissertation, we

show abstract

Pushdown Layers: Encoding Recursive Structure in Transformer Language Models

Murty,

Sharma,

Andreas

et al. 2023

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

Recursion is a prominent feature of human language, and fundamentally challenging for self-attention due to the lack of an explicit recursive-state tracking mechanism. Consequently, Transformer language models poorly capture long-tail recursive structure and exhibit sample-inefficient syntactic generalization. This work introduces Pushdown Layers, a new self-attention layer that models recursive state via a stack tape that tracks estimated depths of every token in an incremental parse of the observed prefix. Transformer LMs with Pushdown Layers are syntactic language models that autoregressively and synchronously update this stack tape as they predict new tokens, in turn using the stack tape to softly modulate attention over tokens-for instance, learning to "skip" over closed constituents. When trained on a corpus of strings annotated with silver constituency parses, Transformers equipped with Pushdown Layers achieve dramatically better and 3-5x more sample-efficient syntactic generalization, while maintaining similar perplexities. Pushdown Layers are a drop-in replacement for standard self-attention. We illustrate this by finetuning GPT2-medium with Pushdown Layers on an automatically parsed WikiText-103, leading to improvements on several GLUE text classification tasks.

show abstract

Learning Context-free Languages with Nondeterministic Stack RNNs

Cited by 5 publications

References 18 publications

Learning Hierarchical Structures with Differentiable Nondeterministic Stacks

Learning Hierarchical Structures with Differentiable Nondeterministic Stacks

Dance preservation archives

Pushdown Layers: Encoding Recursive Structure in Transformer Language Models

Contact Info

Product

Resources

About