Katja Filippova scite author profile

We present an LSTM approach to deletion-based sentence compression where the task is to translate a sentence into a sequence of zeros and ones, corresponding to token deletion decisions. We demonstrate that even the most basic version of the system, which is given no syntactic information (no PoS or NE tags, or dependencies) or desired compression length, performs surprisingly well: around 30% of the compressions from a large test set could be regenerated. We compare the LSTM system with a competitive baseline which is trained on the same amount of data but is additionally provided with all kinds of linguistic features. In an experiment with human raters the LSTMbased model outperforms the baseline achieving 4.5 in readability and 3.8 in informativeness.

show abstract

Dependency tree based sentence compression

Filippova

Strube

2008

View full text Add to dashboard Cite

We present a novel unsupervised method for sentence compression which relies on a dependency tree representation and shortens sentences by removing subtrees. An automatic evaluation shows that our method obtains result comparable or superior to the state of the art. We demonstrate that the choice of the parser affects the performance of the system. We also apply the method to German and report the results of an evaluation with humans.

show abstract

The elephant in the interpretability room: Why use attention as explanation when we have saliency methods?

Bastings

Filippova

2020

112

View full text Add to dashboard Cite

There is a recent surge of interest in using attention as explanation of model predictions, with mixed evidence on whether attention can be used as such. While attention conveniently gives us one weight per input token and is easily extracted, it is often unclear toward what goal it is used as explanation. We find that often that goal, whether explicitly stated or not, is to find out what input tokens are the most relevant to a prediction, and that the implied user for the explanation is a model developer. For this goal and user, we argue that input saliency methods are better suited, and that there are no compelling reasons to use attention, despite the coincidence that it provides a weight for each input. With this position paper, we hope to shift some of the recent focus on attention to saliency methods, and for authors to clearly state the goal and user for their explanations.

show abstract

Sentence fusion via dependency graph compression

Filippova

Strube

2008

View full text Add to dashboard Cite

We present a novel unsupervised sentence fusion method which we apply to a corpus of biographies in German. Given a group of related sentences, we align their dependency trees and build a dependency graph. Using integer linear programming we compress this graph to a new tree, which we then linearize. We use GermaNet and Wikipedia for checking semantic compatibility of co-arguments. In an evaluation with human judges our method outperforms the fusion approach of Barzilay & McKeown (2005) with respect to readability.

show abstract

We Need To Talk About Random Splits

Søgaard¹,

Ebert²,

Bastings³

et al. 2021

View full text Add to dashboard Cite

Gorman and Bedrick ( 2019) argued for using random splits rather than standard splits in NLP experiments. We argue that random splits, like standard splits, lead to overly optimistic performance estimates. We can also split data in biased or adversarial ways, e.g., training on short sentences and evaluating on long ones. Biased sampling has been used in domain adaptation to simulate real-world drift; this is known as the covariate shift assumption. In NLP, however, even worst-case splits, maximizing bias, often under-estimate the error observed on new samples of in-domain data, i.e., the data that models should minimally generalize to at test time. This invalidates the covariate shift assumption. Instead of using multiple random splits, future benchmarks should ideally include multiple, independent test sets instead; if infeasible, we argue that multiple biased splits leads to more realistic performance estimates than multiple random splits.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Katja Filippova

Sentence Compression by Deletion with LSTMs

Dependency tree based sentence compression

The elephant in the interpretability room: Why use attention as explanation when we have saliency methods?

Sentence fusion via dependency graph compression

We Need To Talk About Random Splits

Contact Info

Product

Resources

About