KERMIT: Complementing Transformer Architectures with Encoders of Explicit Syntactic Interpretations

Zanzotto, Fabio Massimo; Santilli, Andrea; Ranaldi, Leonardo; Onorati, Dario; Tommasino, Pierfrancesco; Fallucchi, Francesca

doi:10.18653/v1/2020.emnlp-main.18

Cited by 31 publications

(35 citation statements)

References 44 publications

(41 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Shuffling 1-grams is a common technique for analyzing word-order sensitivity (Sankar et al, 2019;Zanzotto et al, 2020). We split a given sentence by whitespace into a list of n-grams, and re-combined them, in a random order, back into a "shuffled" sentence (see Table 1 for examples).…”

Section: Random Shuffling Methodsmentioning

confidence: 99%

Out of Order: How important is the sequential order of words in a sentence in Natural Language Understanding tasks?

Pham¹,

Bui²,

Mai³

et al. 2021

Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

View full text Add to dashboard Cite

Do state-of-the-art natural language understanding models care about word order? Not always! We found 75% to 90% of the correct predictions of BERT-based classifiers, trained on many GLUE tasks, remain constant after input words are randomly shuffled. Although BERT embeddings are famously contextual, the contribution of each individual word to classification is almost unchanged even after its surrounding words are shuffled. BERTbased models exploit superficial cues (e.g. the sentiment of keywords in sentiment analysis; or the word-wise similarity between sequencepair inputs in natural language inference) to make correct decisions when tokens are randomly shuffled. Encouraging models to capture word order information improves the performance on most GLUE tasks and SQuAD 2.0. Our work suggests that many GLUE tasks are not challenging machines to understand the meaning of a sentence.

show abstract

Section: Random Shuffling Methodsmentioning

confidence: 99%

Out of Order: How important is the sequential order of words in a sentence in Natural Language Understanding tasks?

Pham¹,

Bui²,

Mai³

et al. 2021

Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

View full text Add to dashboard Cite

show abstract

“…Supplying further syntactic knowledge to a BERT model is a recent field of study and has given promising results on other tasks [12,30]. Although there are indications that the BERT model itself already encodes syntactic information implicitly, the KERMITviz architecture enables visualizing which part of the sentence is used during inference, and using the KERMIT encoder in addition to a transformer has outperformed the standalone models BERT and XLNet on several tasks [30]. Therefore, KERMIT may have the potential to enhance both model performance and interpretability of the predictions on the COLIEE dataset.…”

Section: Contextual Embeddings From Language Modelsmentioning

confidence: 99%

“…In this section, we explain the architecture we used to classify textual entailment using the KERMIT encoder [30] combined with a BERT model. The idea of the KERMIT encoder is based on Recursive Neural Networks (RecNN), which process binary tree structures in the manner of a Recurrent Neural Network [24], as well as Distributed Tree Kernels [29], which encode high-dimensional tree fragments into a lower-dimensional vector representation.…”

Section: Kermit+bertmentioning

confidence: 99%

“…Data enrichment with article metadata: From experimenting with different data enrichment strategies, we found that the model tends to give better results Fig. 3 KERMIT+BERT architecture, adapted from Zanzotto et al [30] (an improvement of over 6 percentage points in the validation data) when we add the metadata of the article while embedding it. This is shown in Table 5.…”

Section: Graph Neural Networkmentioning

confidence: 99%

“…Injecting linguistic knowledge in such a model may be a worthwhile consideration, given that some instances contain paraphrases of verbs or nouns which may be easier identified if the parse trees provide syntactic features, potentially easing the alignment of query and article. Therefore, we also employ KERMIT (Kernel-inspired Encoder with Recursive Mechanism for Interpretable Trees) [30] which can make use of symbolic syntactic parse trees as additional features to the contextual embeddings of a BERT model.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Applying BERT Embeddings to Predict Legal Textual Entailment

Wehnert

Dureja

Kutty

et al. 2022

Rev Socionetwork Strat

View full text Add to dashboard Cite

Textual entailment classification is one of the hardest tasks for the Natural Language Processing community. In particular, working on entailment with legal statutes comes with an increased difficulty, for example in terms of different abstraction levels, terminology and required domain knowledge to solve this task. In course of the COLIEE competition, we develop three approaches to classify entailment. The first approach combines Sentence-BERT embeddings with a graph neural network, while the second approach uses the domain-specific model LEGAL-BERT, further trained on the competition’s retrieval task and fine-tuned for entailment classification. The third approach involves embedding syntactic parse trees with the KERMIT encoder and using them with a BERT model. In this work, we discuss the potential of the latter technique and why of all our submissions, the LEGAL-BERT runs may have outperformed the graph-based approach.

show abstract