2012
DOI: 10.1080/09540091.2011.641939
|View full text |Cite
|
Sign up to set email alerts
|

Processing of nested and cross-serial dependencies: an automaton perspective on SRN behaviour

Abstract: Language processing involves the identification and establishment of both nested (stack-like) and crossserial (queue-like) dependencies. This paper analyses the behaviour of simple recurrent networks (SRNs) trained to handle these types of dependency individually and simultaneously. We provide new converging evidence that SRNs store sequences in a fractal data structure similar to a binary expansion. We provide evidence that the process of recalling a stored string by an SRN depletes the stored data structure,… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
11
0

Year Published

2013
2013
2023
2023

Publication Types

Select...
6
1
1
1

Relationship

2
7

Authors

Journals

citations
Cited by 13 publications
(11 citation statements)
references
References 24 publications
0
11
0
Order By: Relevance
“…bitstrings can be predicted with perfect accuracy and cross-entropy, independent of the input length. Furthermore, infinite-precision RNNs and LSTMs can model stacks (Tabor, 2000;Grüning, 2006;Kirov and Frank, 2012) and thus are theoretically capable of modeling 2DYCK and other deterministic context-free languages perfectly. The results presented here thus theoretically confirm the intuition that models entirely built on self-attention may have restricted expressivity when compared to recurrent architectures (Tran et al, 2018;Dehghani et al, 2019;Shen et al, 2018a;Chen et al, 2018;Hao et al, 2019).…”
Section: Discussionmentioning
confidence: 99%
“…bitstrings can be predicted with perfect accuracy and cross-entropy, independent of the input length. Furthermore, infinite-precision RNNs and LSTMs can model stacks (Tabor, 2000;Grüning, 2006;Kirov and Frank, 2012) and thus are theoretically capable of modeling 2DYCK and other deterministic context-free languages perfectly. The results presented here thus theoretically confirm the intuition that models entirely built on self-attention may have restricted expressivity when compared to recurrent architectures (Tran et al, 2018;Dehghani et al, 2019;Shen et al, 2018a;Chen et al, 2018;Hao et al, 2019).…”
Section: Discussionmentioning
confidence: 99%
“…Recent NLP work has also found that neural networks do not readily transfer knowledge across tasks; e.g., pretrained models often perform worse than non-pretrained models (Wang et al, 2019). This lack of generalization across tasks might be due to the tendency of multi-task neural networks to create largely independent representations for different tasks even when a shared representation could be used (Kirov and Frank, 2012). Therefore, to make cross-phenomenon generalizations, neural networks may need to be given an explicit bias for sharing processing across phenomena.…”
Section: Will Models Generalize Acrossmentioning
confidence: 99%
“…This property amounts to a short‐shrifting of the encoding resources used for more deeply embedded causal states relative to less deeply embedded causal states: If there is noise in the encodings, the noise distorts deeper embeddings more than shallow ones. This short‐shrifting is plausibly related to the well‐known limited ability of humans to process deep center‐embeddings—see (Christiansen & Chater, ; Kirov & Frank, ). We would like, therefore, to objectively determine whether the system is exhibiting contraction for pushes and expansion for pops.…”
Section: Fractal Learning Neural Networkmentioning
confidence: 97%