LSTMs Exploit Linguistic Attributes of Data

Liu, Nelson F.; Levy, Omer; Schwartz, Roy; Tan, Chenhao; Smith, Noah A.

doi:10.18653/v1/w18-3024

Cited by 12 publications

(7 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…These structured inputs may be easier to represent internally regardless of the outputs, given current theories that early stages of training are committed to memorizing inputs (Arpit et al, 2017). As such, we may also want to analyze a probe's capacity to memorize unstructured inputin the case of language, we can easily remove structure by shuffling the word sequences themselves, creating random Zipfian-distributed noise, which are harder for neural networks to exploit (Liu et al, 2018). By providing probes with unstructured input, we measure a more domain-independent sense of complexity than the ability to map structured inputs to random labels, because the model cannot rely on syntactic patterns when memorizing shuffled training data.…”

Section: Non-parametric Metrics Of Complexitymentioning

confidence: 99%

Pareto Probing: Trading Off Accuracy for Complexity

Pimentel¹,

Saphra²,

Williams³

et al. 2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

The question of how to probe contextual word representations for linguistic structure in a way that is both principled and useful has seen significant attention recently in the NLP literature. In our contribution to this discussion, we argue for a probe metric that reflects the fundamental trade-off between probe complexity and performance: the Pareto hypervolume. To measure complexity, we present a number of parametric and non-parametric metrics. Our experiments using Pareto hypervolume as an evaluation metric show that probes often do not conform to our expectations-e.g., why should the non-contextual fastText representations encode more morpho-syntactic information than the contextual BERT representations? These results suggest that common, simplistic probing tasks, such as part-of-speech labeling and dependency arc labeling, are inadequate to evaluate the linguistic structure encoded in contextual word representations. This leads us to propose full dependency parsing as a probing task. In support of our suggestion that harder probing tasks are necessary, our experiments with dependency parsing reveal a wide gap in syntactic knowledge between contextual and non-contextual representations.

show abstract

Section: Non-parametric Metrics Of Complexitymentioning

confidence: 99%

Pareto Probing: Trading Off Accuracy for Complexity

Pimentel¹,

Saphra²,

Williams³

et al. 2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

show abstract

“…This approach has been used to probe for different linguistic properties captured within the network. For example researchers probed for i) morphology using attention weights [35], or recurrent neural network (RNN)/transformer representations [51,52,53], in neural machine translation (NMT) [25,26] and language models (LM) [23,24] neurons; (ii) anaphora [54]; (iii) lexical semantics with LM and NMT states [25,26]; and (v) word presence [55], subject-verbagreement [56], relative islands [57], number agreement [58], semantic roles [59], syntactic information [52,56,49,57,60] among others using hidden states. A detailed comprehensive survey is presented in [61].…”

Section: Related Workmentioning

confidence: 99%

What do End-to-End Speech Models Learn about Speaker, Language and Channel Information? A Layer-wise and Neuron-level Analysis

Chowdhury¹,

Durrani²,

Ahmed³

2021

Preprint

View full text Add to dashboard Cite

End-to-end deep neural network architectures have pushed the state-of-the-art in speech technologies, as well as in other spheres of Artificial Intelligence, subsequently leading researchers to train more complex and deeper models. These improvements came at the cost of transparency. Deep neural networks are innately opaque and difficult to interpret, compared to the traditional handcrafted feature-based models. We no longer understand what features are learned within these deep models, where they are preserved, and how they inter-operate. Such an analysis is important for better understanding of the models, for debugging and to ensure fairness in ethical decision making. In this work, we analyze the representations trained within deep speech models, trained towards the task of speaker recognition, dialect identification and reconstruction of masked signals. Specifically, we carry a layer-and neuron-level analysis on the utterance-level representations captured within pretrained speech models for speaker, language and channel properties. We study the following questions: (i) is the information captured in the learned representations? (ii) where is it preserved and how is it distributed? and (iii) can we identify a minimal subset of network that posses this information. To answer these questions, we use a probing framework commonly called as diagnostic classifiers [1]. Our results reveal interesting findings such as: (i) channel and gender information is distributed across the network, ii) the information is redundantly distributed in neurons with respect to a task (up to 80% in some cases); (iii) complex properties such as dialectal information is encoded only in the task-oriented pretrained network, iv) and is localised in the upper layers; (v) we can extract a minimal subset of neurons encoding the pre-defined property; (vi) salient neurons are sometimes shared between properties; (vii) our analysis highlights presence of

show abstract

“…They mainly discuss the extent to which the RNN acquires syntax by comparing experimental accuracy on some syntactic structures, such as number agreements (see Section 7 for details). Some studies also investigate in which vector spaces and layers a specific syntactic information is captured (Liu et al, 2018;Liu et al, 2019). Lately, Suzgun et al (2019) trained LSTM on Dyck-{1,2} formal languages, and showed that it can emulate counter machines.…”

Section: Introductionmentioning

confidence: 99%

How LSTM Encodes Syntax: Exploring Context Vectors and Semi-Quantization on Natural Text

Shibata

Uchiumi

Mochihashi

2020

Proceedings of the 28th International Conference on Computational Linguistics

View full text Add to dashboard Cite

Long Short-Term Memory recurrent neural network (LSTM) is widely used and known to capture informative long-term syntactic dependencies. However, how such information are reflected in its internal vectors for natural text has not yet been sufficiently investigated. We analyze them by learning a language model where syntactic structures are implicitly given. We empirically show that the context update vectors, i.e. outputs of internal gates, are approximately quantized to binary or ternary values to help the language model to count the depth of nesting accurately, as Suzgun et al. ( 2019) recently showed for synthetic Dyck languages. For some dimensions in the context vector, we show that their activations are highly correlated with the depth of phrase structures, such as VP and NP. Moreover, with an L 1 regularization, we also found that it can be accurately predicted whether a word is inside a phrase structure or not from a small number of components of the context vector. Even for the case of learning from raw text, context vectors are still shown to correlate well with the phrase structures. Finally, we show that natural clusters of the functional words and the parts of speech that trigger phrases are represented in a small but principal subspace of the context-update vector of LSTM.

show abstract

LSTMs Exploit Linguistic Attributes of Data

Cited by 12 publications

References 20 publications

Pareto Probing: Trading Off Accuracy for Complexity

Pareto Probing: Trading Off Accuracy for Complexity

What do End-to-End Speech Models Learn about Speaker, Language and Channel Information? A Layer-wise and Neuron-level Analysis

How LSTM Encodes Syntax: Exploring Context Vectors and Semi-Quantization on Natural Text

Contact Info

Product

Resources

About