Sequence Level Contrastive Learning for Text Summarization

Xu, Shusheng; Zhang, Xingxing; Wu, Yi; Wei, Furu

doi:10.1609/aaai.v36i10.21409

Cited by 57 publications

(14 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, they also exhibit similar problems of stability as reinforcement learning. Contrastive Learning Recently, contrastive learning (Hadsell et al, 2006) has been introduced into several conditional text generation tasks, such as machine translation Pan et al, 2021), text summarization (Cao and Wang, 2021;Xu et al, 2021;Sun and Li, 2021), and other tasks (Uehara et al, 2020;Cho et al, 2021;Lee et al, 2021b). Among these application scenarios, most work deployed contrastive learning in the latent representation space, following the framework proposed in .…”

Section: Related Workmentioning

confidence: 99%

“…GOLD (Pang and He, 2021) uses offline reinforcement learning to train the BART model by treating the reference summaries as the demonstrations, a different formulation that can also improve the performance of the original BART. SeqCo (Xu et al, 2021) and ConSum (Sun and Li, 2021) are two recent methods that aim to leverage contrastive learning to improve the performance of the abstractive summarization model (BART). Implementation Details In the following experiments, we use either BART or PEGASUS as a backbone.…”

Section: Experimental Settingsmentioning

confidence: 99%

See 1 more Smart Citation

BRIO: Bringing Order to Abstractive Summarization

Liu¹,

Liu²,

Radev³

et al. 2022

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

104

View full text Add to dashboard Cite

ive summarization models are commonly trained using maximum likelihood estimation, which assumes a deterministic (onepoint) target distribution in which an ideal model will assign all the probability mass to the reference summary. This assumption may lead to performance degradation during inference, where the model needs to compare several system-generated (candidate) summaries that have deviated from the reference summary. To address this problem, we propose a novel training paradigm which assumes a non-deterministic distribution so that different candidate summaries are assigned probability mass according to their quality. Our method achieves a new state-of-the-art result on the CNN/DailyMail (47.78 ROUGE-1) and XSum (49.07 ROUGE-1) datasets. Further analysis also shows that our model can estimate probabilities of candidate summaries that are more correlated with their level of quality. 1

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Experimental Settingsmentioning

confidence: 99%

BRIO: Bringing Order to Abstractive Summarization

Liu¹,

Liu²,

Radev³

et al. 2022

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

104

View full text Add to dashboard Cite

show abstract

“…Analysing the convergence of CIAug For all benchmark datasets, we observe that CIAug reaches a benchmark F1 score faster than Mixup method, as shown in Figure 2. 2 As CIAug selects samples for Mixup based on a learning curriculum, it leads to generation of more suitable synthetic samples in a staggered manner resulting in better training (Xu et al, 2021) Using the NLPAug library (Ma, 2019) we substitute up to 10% of the words in each sentence with their synonyms found in WordNet (Feinerer and Hornik, 2020) and present the results in Table 4. We observe that both CIAug-NT and CIAug are more robust compared to regular Mixup by a difference of 6.72% and 6% respectively.…”

Section: Impact Of Distance Metricmentioning

confidence: 99%

“…We propose CIAug 1 , a method which addresses these challenges by offering an augmentation procedure that selects samples in an adaptive fashion and is geometrically sound. CIAug's sampling strategy follows the idea that selecting easier mixing samples first and gradually increasing sample difficulty based on relative spatial position would generate more suitable synthetic inputs, resulting in better model training (Xu et al, 2021). This notion ties in with the framework of curriculum learning (Krueger and Dayan, 2009), where training data is presented in a similarly staggered way, increasing model capabilities (Bengio et al, 2009).…”

Section: Introductionmentioning

confidence: 99%

CIAug: Equipping Interpolative Augmentation with Curriculum Learning

Sawhney¹,

Soun²,

Pandit³

et al. 2022

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

View full text Add to dashboard Cite

Interpolative data augmentation has proven to be effective for NLP tasks. Despite its merits, the sample selection process in mixup is random, which might make it difficult for the model to generalize better and converge faster. We propose CIAug, a novel curriculum-based learning method that builds upon mixup. It leverages the relative position of samples in hyperbolic embedding space as a complexity measure to gradually mix up increasingly difficult and diverse samples along training. CIAug achieves state-of-the-art results over existing interpolative augmentation methods on 10 benchmark datasets across 4 languages in text classification and named-entity recognition tasks. It also converges and achieves benchmark F1 scores 3 times faster. We empirically analyze the various components of CIAug, and evaluate its robustness against adversarial attacks.

show abstract

“…Contrastive learning methods encourage models to distinguish between positive and negative examples (Nan et al, 2021;Cao and Wang, 2021;Xu et al, 2022). Nan et al (2021) generate multiple summaries candidates by sampling from the pre-trained models and selecting positive and negative examples according to the question answer based metric.…”

Section: Abstractive Summarizationmentioning

confidence: 99%

Jointly Learning Guidance Induction and Faithful Summary Generation via Conditional Variational Autoencoders

Wang¹,

Zhao²

2022

Findings of the Association for Computational Linguistics: NAACL 2022

View full text Add to dashboard Cite

ive summarization can generate high quality results with the development of the neural network. However, generating factual consistency summaries is a challenging task for abstractive summarization. Recent studies extract the additional information with off-the-shelf tools from the source document as a clue to guide the summary generation, which shows effectiveness to improve the faithfulness. Unlike these work, we present a novel framework based on conditional variational autoencoders, which induces the guidance information and generates the summary equipped with the guidance synchronously. Experiments on XSUM and CNNDM dataset show that our approach can generate relevant and fluent summaries which is more faithful than the existing state-of-theart approaches, according to multiple factual consistency metrics.

show abstract

Sequence Level Contrastive Learning for Text Summarization

Cited by 57 publications

References 24 publications

BRIO: Bringing Order to Abstractive Summarization

BRIO: Bringing Order to Abstractive Summarization

CIAug: Equipping Interpolative Augmentation with Curriculum Learning

Jointly Learning Guidance Induction and Faithful Summary Generation via Conditional Variational Autoencoders

Contact Info

Product

Resources

About