TL;DR: Mining Reddit to Learn Automatic Summarization

Völske, Michael; Potthast, Martin; Syed, Shahbaz; Stein, Benno

doi:10.18653/v1/w17-4508

Cited by 75 publications

(74 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For the former, we use the CNN/DailyMail news dataset (Hermann et al, 2015;, widely used for the task of abstractive text summarization. For the latter, we use the Webis-TLDR-17 corpus (Völske et al, 2017), automatically created using T L; DR tags on Reddit 2 . Figure 1 shows the distribution of lexical formality scores over these and the complete dataset (based on Equation 6).…”

Section: Methodsmentioning

confidence: 99%

Generating Formality-Tuned Summaries Using Input-Dependent Rewards

Chawla¹,

Srinivasan²,

Chhaya³

2019

Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

View full text Add to dashboard Cite

ive text summarization aims at generating human-like summaries by understanding and paraphrasing the given input content. Recent efforts based on sequence-to-sequence networks only allow the generation of a single summary. However, it is often desirable to accommodate the psycho-linguistic preferences of the intended audience while generating the summaries. In this work, we present a reinforcement learning based approach to generate formality-tailored summaries for an input article. Our novel input-dependent reward function aids in training the model with stylistic feedback on sampled and ground-truth summaries together. Once trained, the same model can generate formal and informal summary variants. Our automated and qualitative evaluations show the viability of the proposed framework.

show abstract

Section: Methodsmentioning

confidence: 99%

Generating Formality-Tuned Summaries Using Input-Dependent Rewards

Chawla¹,

Srinivasan²,

Chhaya³

2019

Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

View full text Add to dashboard Cite

show abstract

“…To assess the possible benefits of reinforcing over the proposed QG-based metric, which does not require human-generated reference summaries, we employ TL;DR 2 , a large-scale dataset for automatic summarization built on social media data, compounding to 4 Million training pairs (Völske et al, 2017). Both CNN-DM and TL;DR datasets are in English.…”

Section: Data Usedmentioning

confidence: 99%

Answers Unite! Unsupervised Metrics for Reinforced Summarization Models

Scialom¹,

Lamprier²,

Piwowarski³

et al. 2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

View full text Add to dashboard Cite

ive summarization approaches based on Reinforcement Learning (RL) have recently been proposed to overcome classical likelihood maximization. RL enables to consider complex, possibly non-differentiable, metrics that globally assess the quality and relevance of the generated outputs. ROUGE, the most used summarization metric, is known to suffer from bias towards lexical similarity as well as from suboptimal accounting for fluency and readability of the generated abstracts. We thus explore and propose alternative evaluation measures: the reported humanevaluation analysis shows that the proposed metrics, based on Question Answering, favorably compares to ROUGE -with the additional property of not requiring reference summaries. Training a RL-based model on these metrics leads to improvements (both in terms of human or automated metrics) over current approaches that use ROUGE as a reward.

show abstract

“…TL;DR Reddit corpus (Völske et al, 2017): This is the dataset for the TL;DR challenge. They pro- Extractive Summarization and Abstractive Summarization modules are finetuned on each datasets for obtaining respetive results.…”

Section: Datasets and Experimental Setupmentioning

confidence: 99%

“…TL;DR Reddit corpus (Völske et al, 2017): This is the dataset for the TL;DR challenge. They pro-Algorithm 2 Order Preserving Selection 1: A = list(< sentence, id, score >) 2: procedure REORDER(A) 3: sortedA = sortByScore(A) et al (2015), Gavrilov (2017) and the PGN implementation of paper See et al (2017), Kumar (2019) are used as references for abstractive module.…”

Section: Datasets and Experimental Setupmentioning

confidence: 99%

VAE-PGN based Abstractive Model in Multi-stage Architecture for Text Summarization

Choi¹,

Ravuru²,

Dryjanski³

et al. 2019

Proceedings of the 12th International Conference on Natural Language Generation

View full text Add to dashboard Cite

This paper describes our submission to the TL;DR challenge. Neural abstractive summarization models have been successful in generating fluent and consistent summaries with advancements like the copy (Pointer-generator) and coverage mechanisms. However, these models suffer from their extractive nature as they learn to copy words from the source text. In this paper, we propose a novel abstractive model based on Variational Autoencoder (VAE) to address this issue. We also propose a Unified Summarization Framework for the generation of summaries. Our model eliminates non-critical information at a sentencelevel with an extractive summarization module and generates the summary word by word using an abstractive summarization module. To implement our framework, we combine submodules with state-of-the-art techniques including Pointer-Generator Network (PGN) and BERT while also using our new VAE-PGN abstractive model. We evaluate our model on the benchmark Reddit corpus as part of the TL;DR challenge and show that our model outperforms the baseline in ROUGE score while generating diverse summaries.

show abstract

TL;DR: Mining Reddit to Learn Automatic Summarization

Cited by 75 publications

References 10 publications

Generating Formality-Tuned Summaries Using Input-Dependent Rewards

Generating Formality-Tuned Summaries Using Input-Dependent Rewards

Answers Unite! Unsupervised Metrics for Reinforced Summarization Models

VAE-PGN based Abstractive Model in Multi-stage Architecture for Text Summarization

Contact Info

Product

Resources

About