Quantifying Appropriateness of Summarization Data for Curriculum Learning

Kano, Ryuji; Takahashi, Takumi; Nishino, Toru; Taniguchi, Masateru; Taniguchi, Takao; Ohkuma, Tomoko

doi:10.18653/v1/2021.eacl-main.119

Cited by 2 publications

(2 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Given that research has shown that the training data of summarization models are noisy, researchers have proposed methods for training summarization models based on noisy data. For example, Kano et al (2021) propose a model that can quantify noise to train summarization models from noisy data. The improvement of the models indicates that the noisy data has noticeable impacts for the training of the models.…”

Section: Related Workmentioning

confidence: 99%

Automatically Discarding Straplines to Improve Data Quality for Abstractive News Summarization

Keleg¹,

Lindemann²,

Liu³

et al. 2022

Proceedings of NLP Power! The First Workshop on Efficient Benchmarking in NLP

View full text Add to dashboard Cite

Recent improvements in automatic news summarization fundamentally rely on large corpora of news articles and their summaries. These corpora are often constructed by scraping news websites, which results in including not only summaries but also other kinds of texts. Apart from more generic noise, we identify straplines as a form of text scraped from news websites that commonly turn out not to be summaries. The presence of these nonsummaries threatens the validity of scraped corpora as benchmarks for news summarization. We have annotated extracts from two news sources that form part of the Newsroom corpus (Grusky et al., 2018), labeling those which were straplines, those which were summaries, and those which were both. We present a rule-based strapline detection method that achieves good performance on a manually annotated test set 1 . Automatic evaluation indicates that removing straplines and noise from the training data of a news summarizer results in higher quality summaries, with improvements as high as 7 points ROUGE score.

show abstract

Section: Related Workmentioning

confidence: 99%

Automatically Discarding Straplines to Improve Data Quality for Abstractive News Summarization

Keleg¹,

Lindemann²,

Liu³

et al. 2022

Proceedings of NLP Power! The First Workshop on Efficient Benchmarking in NLP

View full text Add to dashboard Cite

show abstract

“…We build on the approach introduced in (Xu et al, 2020); however, the core differences are both the downstream tasks (classification versus abstractive summarization) and the difficulty metrics. In contrast to the only other summarization work that we know of, Kano et al (2021) focus on large datasets, while we focus on low resource domains. We also introduce two different difficulty metrics (ROUGE and specificity).…”

Section: Related Workmentioning

confidence: 99%

Mitigating Data Scarceness through Data Synthesis, Augmentation and Curriculum for Abstractive Summarization

Magooda¹,

Litman²

2021

Preprint

View full text Add to dashboard Cite

This paper explores three simple data manipulation techniques (synthesis, augmentation, curriculum) for improving abstractive summarization models without the need for any additional data. We introduce a method of data synthesis with paraphrasing, a data augmentation technique with sample mixing, and curriculum learning with two new difficulty metrics based on specificity and abstractiveness. We conduct experiments to show that these three techniques can help improve abstractive summarization across two summarization models and two different small datasets. Furthermore, we show that these techniques can improve performance when applied in isolation and when combined.

show abstract

Quantifying Appropriateness of Summarization Data for Curriculum Learning

Cited by 2 publications

References 18 publications

Automatically Discarding Straplines to Improve Data Quality for Abstractive News Summarization

Automatically Discarding Straplines to Improve Data Quality for Abstractive News Summarization

Mitigating Data Scarceness through Data Synthesis, Augmentation and Curriculum for Abstractive Summarization

Contact Info

Product

Resources

About