Proceedings of the Third Workshop on Representation Learning for NLP 2018
DOI: 10.18653/v1/w18-3024
|View full text |Cite
|
Sign up to set email alerts
|

LSTMs Exploit Linguistic Attributes of Data

Abstract: While recurrent neural networks have found success in a variety of natural language processing applications, they are general models of sequential data. We investigate how the properties of natural language data affect an LSTM's ability to learn a nonlinguistic task: recalling elements from its input. We find that models trained on natural language data are able to recall tokens from much longer sequences than models trained on non-language sequential data. Furthermore, we show that the LSTM learns to solve th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
6
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 12 publications
(7 citation statements)
references
References 20 publications
1
6
0
Order By: Relevance
“…These structured inputs may be easier to represent internally regardless of the outputs, given current theories that early stages of training are committed to memorizing inputs (Arpit et al, 2017). As such, we may also want to analyze a probe's capacity to memorize unstructured inputin the case of language, we can easily remove structure by shuffling the word sequences themselves, creating random Zipfian-distributed noise, which are harder for neural networks to exploit (Liu et al, 2018). By providing probes with unstructured input, we measure a more domain-independent sense of complexity than the ability to map structured inputs to random labels, because the model cannot rely on syntactic patterns when memorizing shuffled training data.…”
Section: Non-parametric Metrics Of Complexitymentioning
confidence: 99%
“…These structured inputs may be easier to represent internally regardless of the outputs, given current theories that early stages of training are committed to memorizing inputs (Arpit et al, 2017). As such, we may also want to analyze a probe's capacity to memorize unstructured inputin the case of language, we can easily remove structure by shuffling the word sequences themselves, creating random Zipfian-distributed noise, which are harder for neural networks to exploit (Liu et al, 2018). By providing probes with unstructured input, we measure a more domain-independent sense of complexity than the ability to map structured inputs to random labels, because the model cannot rely on syntactic patterns when memorizing shuffled training data.…”
Section: Non-parametric Metrics Of Complexitymentioning
confidence: 99%
“…This approach has been used to probe for different linguistic properties captured within the network. For example researchers probed for i) morphology using attention weights [35], or recurrent neural network (RNN)/transformer representations [51,52,53], in neural machine translation (NMT) [25,26] and language models (LM) [23,24] neurons; (ii) anaphora [54]; (iii) lexical semantics with LM and NMT states [25,26]; and (v) word presence [55], subject-verbagreement [56], relative islands [57], number agreement [58], semantic roles [59], syntactic information [52,56,49,57,60] among others using hidden states. A detailed comprehensive survey is presented in [61].…”
Section: Related Workmentioning
confidence: 99%
“…They mainly discuss the extent to which the RNN acquires syntax by comparing experimental accuracy on some syntactic structures, such as number agreements (see Section 7 for details). Some studies also investigate in which vector spaces and layers a specific syntactic information is captured (Liu et al, 2018;Liu et al, 2019). Lately, Suzgun et al (2019) trained LSTM on Dyck-{1,2} formal languages, and showed that it can emulate counter machines.…”
Section: Introductionmentioning
confidence: 99%