On Extractive and Abstractive Neural Document Summarization with Transformer Language Models

Pilault, Jonathan; Li, Raymond; Subramanian, Sandeep; Pal, Chris

doi:10.18653/v1/2020.emnlp-main.748

Cited by 119 publications

(93 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…By setting γ=0.0, our method is comparable to the attention-based method in Manakul and Gales (2020). By setting γ=1.0, our method is similar to the extractive models in Hsu et al (2018); Pilault et al (2020). In Table 4, we show that when coupled with BART, MCS yields better summarization performance than both Attn-only and Ext-only baselines.…”

Section: Multitask Content Selection (Mcs)mentioning

confidence: 84%

“…Alternatively, earlier methods show that good content selection helps abstractive news summarization systems (Chen and Bansal, 2018;Gehrmann et al, 2018;Hsu et al, 2018). Hybrid systems that select sentences and generate an abstractive summary have been proposed such as extractive system + TLM for scientific articles (Pilault et al, 2020), simple selection + BART for podcasts (Manakul and Gales, 2020;Song et al, 2020), and guided summarization by BERT-based keyword/sentence extraction + BART for news and scientific articles (He et al, 2020;Dou et al, 2021).…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Long-Span Summarization via Local Attention and Content Selection

Manakul¹,

Gales²

2021

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer

View full text Add to dashboard Cite

Transformer-based models have achieved state-of-the-art results in a wide range of natural language processing (NLP) tasks including document summarization. Typically these systems are trained by fine-tuning a large pretrained model to the target task. One issue with these transformer-based models is that they do not scale well in terms of memory and compute requirements as the input length grows. Thus, for long document summarization, it can be challenging to train or fine-tune these models. In this work, we exploit large pre-trained transformer-based models and address long-span dependencies in abstractive summarization using two methods: local self-attention; and explicit content selection. These approaches are compared on a range of network configurations. Experiments are carried out on standard long-span summarization tasks, including Spotify Podcast, arXiv, and PubMed datasets. We demonstrate that by combining these methods, we can achieve state-of-the-art results on all three tasks in the ROUGE scores. Moreover, without a large-scale GPU card, our approach can achieve comparable or better results than existing approaches. 1

show abstract

Section: Multitask Content Selection (Mcs)mentioning

confidence: 84%

Section: Related Workmentioning

confidence: 99%

Long-Span Summarization via Local Attention and Content Selection

Manakul¹,

Gales²

2021

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer

View full text Add to dashboard Cite

show abstract

“…Work in this area has mostly used used abstracts or peer reviews as targets (Cachola et al, 2020;Cohan et al, 2018;Jaidka et al, 2017). In particular, Pilault et al (2020) show that using a simple extractive summary as input for abstractive summarization of scholarly texts work well. Researchers have also used citing sentences as part of the input for summarization, recognizing the explanatory power of these texts (Nakov et al, 2004;Cohan and Goharian, 2017;Yasunaga et al, 2019).…”

Section: Related Workmentioning

confidence: 96%

Explaining Relationships Between Scientific Documents

Luu¹,

Wu²,

Koncel-Kedziorski³

et al. 2021

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer

View full text Add to dashboard Cite

We address the task of explaining relationships between two scientific documents using natural language text. This task requires modeling the complex content of long technical documents, deducing a relationship between these documents, and expressing that relationship in text. Successful solutions can help improve researcher efficiency in search and review. In this paper, we operationalize this task by using citing sentences as a proxy. We establish a large dataset for our task. We pretrain a large language model to serve as the foundation for autoregressive approaches to the task. We explore the impact of taking different views on the two documents, including the use of dense representations extracted with scientific information extraction systems. We provide extensive automatic and human evaluations which show the promise of such models, and make clear the challenges for future work.

show abstract

“…How abstractive are the summaries? Abstractive summarizers generate surprisingly extractive summaries, copying large fragments unmodified from the input documents into the summaries (Weber et al, 2018;Pilault et al, 2020). We hypothesize that providing graph representations of the input can help the model abstract away from the specific lexical content of the input and generate summaries that are more abstractive.…”

Section: Ablations and Analysesmentioning

confidence: 99%

Efficiently Summarizing Text and Graph Encodings of Multi-Document Clusters

Pasunuru¹,

Liu²,

Bansal³

et al. 2021

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

View full text Add to dashboard Cite

This paper presents an efficient graphenhanced approach to multi-document summarization (MDS) with an encoder-decoder Transformer model. This model is based on recent advances in pre-training both encoder and decoder on very large text data , and it incorporates an efficient encoding mechanism (Beltagy et al., 2020) that avoids the quadratic memory growth typical for traditional Transformers. We show that this powerful combination not only scales to large input documents commonly found when summarizing news clusters; it also enables us to process additional input in the form of auxiliary graph representations, which we derive from the multi-document clusters. We present a mechanism to incorporate such graph information into the encoder-decoder model that was pre-trained on text only. Our approach leads to significant improvements on the Multi-News dataset, overall leading to an average 1.8 ROUGE score improvement over previous work . We also show improvements in a transfer-only setup on the DUC-2004 dataset. The graph encodings lead to summaries that are more abstractive. Human evaluation shows that they are also more informative and factually more consistent with their input documents. 1

show abstract

On Extractive and Abstractive Neural Document Summarization with Transformer Language Models

Cited by 119 publications

References 27 publications

Long-Span Summarization via Local Attention and Content Selection

Long-Span Summarization via Local Attention and Content Selection

Explaining Relationships Between Scientific Documents

Efficiently Summarizing Text and Graph Encodings of Multi-Document Clusters

Contact Info

Product

Resources

About