Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2020
DOI: 10.18653/v1/2020.emnlp-main.18
|View full text |Cite
|
Sign up to set email alerts
|

KERMIT: Complementing Transformer Architectures with Encoders of Explicit Syntactic Interpretations

Abstract: Syntactic parsers have dominated natural language understanding for decades. Yet, their syntactic interpretations are losing centrality in downstream tasks due to the success of large-scale textual representation learners. In this paper, we propose KERMIT (Kernelinspired Encoder with Recursive Mechanism for Interpretable Trees) to embed symbolic syntactic parse trees into artificial neural networks and to visualize how syntax is used in inference. We experimented with KERMIT paired with two state-of-the-art tr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
35
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 31 publications
(35 citation statements)
references
References 44 publications
(41 reference statements)
0
35
0
Order By: Relevance
“…Shuffling 1-grams is a common technique for analyzing word-order sensitivity (Sankar et al, 2019;Zanzotto et al, 2020). We split a given sentence by whitespace into a list of n-grams, and re-combined them, in a random order, back into a "shuffled" sentence (see Table 1 for examples).…”
Section: Random Shuffling Methodsmentioning
confidence: 99%
“…Shuffling 1-grams is a common technique for analyzing word-order sensitivity (Sankar et al, 2019;Zanzotto et al, 2020). We split a given sentence by whitespace into a list of n-grams, and re-combined them, in a random order, back into a "shuffled" sentence (see Table 1 for examples).…”
Section: Random Shuffling Methodsmentioning
confidence: 99%
“…Supplying further syntactic knowledge to a BERT model is a recent field of study and has given promising results on other tasks [12,30]. Although there are indications that the BERT model itself already encodes syntactic information implicitly, the KERMITviz architecture enables visualizing which part of the sentence is used during inference, and using the KERMIT encoder in addition to a transformer has outperformed the standalone models BERT and XLNet on several tasks [30]. Therefore, KERMIT may have the potential to enhance both model performance and interpretability of the predictions on the COLIEE dataset.…”
Section: Contextual Embeddings From Language Modelsmentioning
confidence: 99%
“…In this section, we explain the architecture we used to classify textual entailment using the KERMIT encoder [30] combined with a BERT model. The idea of the KERMIT encoder is based on Recursive Neural Networks (RecNN), which process binary tree structures in the manner of a Recurrent Neural Network [24], as well as Distributed Tree Kernels [29], which encode high-dimensional tree fragments into a lower-dimensional vector representation.…”
Section: Kermit+bertmentioning
confidence: 99%
See 2 more Smart Citations