Tobias Falke scite author profile

While recent progress on abstractive summarization has led to remarkably fluent summaries, factual errors in generated summaries still severely limit their use in practice. In this paper, we evaluate summaries produced by state-of-the-art models via crowdsourcing and show that such errors occur frequently, in particular with more abstractive models. We study whether textual entailment predictions can be used to detect such errors and if they can be reduced by reranking alternative predicted summaries. That leads to an interesting downstream application for entailment models. In our experiments, we find that outof-the-box entailment models trained on NLI datasets do not yet offer the desired performance for the downstream task and we therefore release our annotations as additional test data for future extrinsic evaluations of NLI.

show abstract

Bringing Structure into Summaries: Crowdsourcing a Benchmark Corpus of Concept Maps

Falke¹,

Gurevych²

2017

View full text Add to dashboard Cite

Concept maps can be used to concisely represent important information and bring structure into large document collections. Therefore, we study a variant of multidocument summarization that produces summaries in the form of concept maps. However, suitable evaluation datasets for this task are currently missing. To close this gap, we present a newly created corpus of concept maps that summarize heterogeneous collections of web documents on educational topics. It was created using a novel crowdsourcing approach that allows us to efficiently determine important elements in large document collections. We release the corpus along with a baseline system and proposed evaluation protocol to enable further research on this variant of summarization. 1

show abstract

Data-Efficient Paraphrase Generation to Bootstrap Intent Classification and Slot Labeling for New Features in Task-Oriented Dialog Systems

Jolly

Falke

Tırkaz³

et al. 2020

View full text Add to dashboard Cite

Recent progress through advanced neural models pushed the performance of task-oriented dialog systems to almost perfect accuracy on existing benchmark datasets for intent classification and slot labeling. However, in evolving real-world dialog systems, where new functionality is regularly added, a major additional challenge is the lack of annotated training data for such new functionality, as the necessary data collection efforts are laborious and time-consuming. A potential solution to reduce the effort is to augment initial seed data by paraphrasing existing utterances automatically. In this paper, we propose a new, data-efficient approach following this idea. Using an interpretation-to-text model for paraphrase generation, we are able to rely on existing dialog system training data, and, in combination with shuffling-based sampling techniques, we can obtain diverse and novel paraphrases from small amounts of seed data. In experiments on a public dataset and with a real-world dialog system, we observe improvements for both intent classification and slot labeling, demonstrating the usefulness of our approach.

show abstract

Leveraging User Paraphrasing Behavior In Dialog Systems To Automatically Collect Annotations For Long-Tail Utterances

Falke¹,

Boese²,

Sorokin³

et al. 2020

View full text Add to dashboard Cite

In large-scale commercial dialog systems, users express the same request in a wide variety of alternative ways with a long tail of less frequent alternatives. Handling the full range of this distribution is challenging, in particular when relying on manual annotations. However, the same users also provide useful implicit feedback as they often paraphrase an utterance if the dialog system failed to understand it. We propose MARUPA, a method to leverage this type of feedback by creating annotated training examples from it. MARUPA creates new data in a fully automatic way, without manual intervention or effort from annotators, and specifically for currently failing utterances. By re-training the dialog system on this new data, accuracy and coverage for longtail utterances can be improved. In experiments, we study the effectiveness of this approach in a commercial dialog system across various domains and three languages.

show abstract

GraphDocExplore: A Framework for the Experimental Comparison of Graph-based Document Exploration Techniques

Falke

Gurevych

2017

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Tobias Falke

Ranking Generated Summaries by Correctness: An Interesting but Challenging Application for Natural Language Inference

Bringing Structure into Summaries: Crowdsourcing a Benchmark Corpus of Concept Maps

Data-Efficient Paraphrase Generation to Bootstrap Intent Classification and Slot Labeling for New Features in Task-Oriented Dialog Systems

Leveraging User Paraphrasing Behavior In Dialog Systems To Automatically Collect Annotations For Long-Tail Utterances

GraphDocExplore: A Framework for the Experimental Comparison of Graph-based Document Exploration Techniques

Contact Info

Product

Resources

About