Discovering the Compositional Structure of Vector Representations with Role Learning Networks

Soulos, Paul; McCoy, R. Thomas; Linzen, Tal; Smolensky, Paul

doi:10.18653/v1/2020.blackboxnlp-1.23

Cited by 21 publications

(20 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We excluded grounded data sets such as CLEVR (Johnson et al, 2017) and SQOOP (Bahdanau et al, 2018), which contain more than one modality. Furthermore, we did not include studies whose primary focus is on how neural networks implement compositional structures (Giulianelli, Harding, Mohnert, Hupkes, & Zuidema, 2018;Lakretz et al, 2019;McCoy, Linzen, Dunbar, & Smolensky, 2019;Soulos, McCoy, Linzen, & Smolensky, 2019;Weiss, Goldberg, & Yahav, 2018a) or studies that evaluate compositionality only based on models' representations (Andreas, 2019). 3.…”

Section: Arithmetic Language and Mathematical Reasoningmentioning

confidence: 99%

Compositionality Decomposed: How do Neural Networks Generalise?

Hupkes

Dankers

Mul³

et al. 2020

jair

149

164

View full text Add to dashboard Cite

Despite a multitude of empirical studies, little consensus exists on whether neural networks are able to generalise compositionally, a controversy that, in part, stems from a lack of agreement about what it means for a neural model to be compositional. As a response to this controversy, we present a set of tests that provide a bridge between, on the one hand, the vast amount of linguistic and philosophical theory about compositionality of language and, on the other, the successful neural models of language. We collect different interpretations of compositionality and translate them into five theoretically grounded tests for models that are formulated on a task-independent level. In particular, we provide tests to investigate (i) if models systematically recombine known parts and rules (ii) if models can extend their predictions beyond the length they have seen in the training data (iii) if models’ composition operations are local or global (iv) if models’ predictions are robust to synonym substitutions and (v) if models favour rules or exceptions during training. To demonstrate the usefulness of this evaluation paradigm, we instantiate these five tests on a highly compositional data set which we dub PCFG SET and apply the resulting tests to three popular sequence-to-sequence models: a recurrent, a convolution-based and a transformer model. We provide an in-depth analysis of the results, which uncover the strengths and weaknesses of these three architectures and point to potential areas of improvement.

show abstract

Section: Arithmetic Language and Mathematical Reasoningmentioning

confidence: 99%

Compositionality Decomposed: How do Neural Networks Generalise?

Hupkes

Dankers

Mul³

et al. 2020

jair

149

164

View full text Add to dashboard Cite

show abstract

“…Further work has found impressive degrees of syntactic structure in Transformer encodings (Hewitt and Manning, 2019) (Soulos et al, 2020) Our position in this paper is simple: we argue that the literature on syntactic probing is methodologically flawed, owing to a conflation of syntax with semantics. We contend that no existing probing work has rigorously tested whether BERT encodes syntax, and a fortiori this literature should not be used to support this claim.…”

mentioning

confidence: 83%

Do Syntactic Probes Probe Syntax? Experiments with Jabberwocky Probing

Maudslay¹,

Cotterell²

2021

Preprint

View full text Add to dashboard Cite

Analysing whether neural language models encode linguistic information has become popular in NLP. One method of doing so, which is frequently cited to support the claim that models like BERT encode syntax, is called probing; probes are small supervised models trained to extract linguistic information from another model's output. If a probe is able to predict a particular structure, it is argued that the model whose output it is trained on must have implicitly learnt to encode it. However, drawing a generalisation about a model's linguistic knowledge about a specific phenomena based on what a probe is able to learn may be problematic: in this work, we show that semantic cues in training data means that syntactic probes do not properly isolate syntax. We generate a new corpus of semantically nonsensical but syntactically well-formed Jabberwocky sentences, which we use to evaluate two probes trained on normal data. We train the probes on several popular language models (BERT, GPT-2, and RoBERTa), and find that in all settings they perform worse when evaluated on these data, for one probe by an average of 15.4 UUAS points absolute. Although in most cases they still outperform the baselines, their lead is reduced substantially, e.g. by 53% in the case of BERT for one probe. This begs the question: what empirical scores constitute knowing syntax? 'Twas Brillig, and the Slithy TovesRecently, unsupervised language models like BERT (Devlin et al., 2019) have become popular within natural language processing (NLP). These pre-trained sentence encoders, known affectionately as BERToids (Rogers et al., 2020), have pushed forward the state of the art in many NLP tasks. Given their impressive performance, a natural question to ask is whether models like these implicitly learn to encode linguistic structures, such as part-of-speech tags or dependency trees.

show abstract

“…Implicit TPR Encodings in Neural Networks showed that, in GRUbased (Cho et al, 2014) encoder-decoder networks performing fully-compositional string manipulations, trained on extensive data that fully exemplifies the range of possible compositions, the medial encoding between encoder and decoder could be extremely well approximated by TPRs. Soulos et al (2019) presented the ROLE model that learns its own role scheme to optimize the fit of a TPR approximation to a given set of internal representations in a pre-trained target neural network, removing the need for human-generated hypotheses about the role schemes the network might be implementing. While this work successfully interprets the Tensor Product Representation in fully compositional tasks, abstractive summarization, as well as most other NLP tasks, are only partially compositional and the symbolic rules in language are much more complex.…”

Section: Related Workmentioning

confidence: 99%

Enriching Transformers with Structured Tensor-Product Representations for Abstractive Summarization

Jiang¹,

Çelikyılmaz²,

Smolensky³

et al. 2021

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

Self Cite

View full text Add to dashboard Cite

ive summarization, the task of generating a concise summary of input documents, requires: (1) reasoning over the source document to determine the salient pieces of information scattered across the long document, and (2) composing a cohesive text by reconstructing these salient facts into a shorter summary that faithfully reflects the complex relations connecting these facts. In this paper, we adapt TP-TRANSFORMER (Schlag et al., 2019), an architecture that enriches the original Transformer (Vaswani et al., 2017) with the explicitly compositional Tensor Product Representation (TPR), for the task of abstractive summarization. The key feature of our model is a structural bias that we introduce by encoding two separate representations for each token to represent the syntactic structure (with role vectors) and semantic content (with filler vectors) separately. The model then binds the role and filler vectors into the TPR as the layer output. We argue that the structured intermediate representations enable the model to take better control of the contents (salient facts) and structures (the syntax that connects the facts) when generating the summary. Empirically, we show that our TP-TRANSFORMER outperforms the Transformer and the original TP-TRANSFORMER significantly on several abstractive summarization datasets based on both automatic and human evaluations. On several syntactic and semantic probing tasks, we demonstrate the emergent structural information in the role vectors and improved syntactic interpretability in the TPR layer outputs. 1 * Work partially done while at Microsoft Research.

show abstract

Discovering the Compositional Structure of Vector Representations with Role Learning Networks

Cited by 21 publications

References 34 publications

Compositionality Decomposed: How do Neural Networks Generalise?

Compositionality Decomposed: How do Neural Networks Generalise?

Do Syntactic Probes Probe Syntax? Experiments with Jabberwocky Probing

Enriching Transformers with Structured Tensor-Product Representations for Abstractive Summarization

Contact Info

Product

Resources

About