2021
DOI: 10.48550/arxiv.2110.10150
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Summ^N: A Multi-Stage Summarization Framework for Long Input Dialogues and Documents

Abstract: Text summarization is an essential task to help readers capture salient information from documents, news, interviews, and meetings. However, most state-of-the-art pretrained language models are unable to efficiently process long text commonly seen in the summarization problem domain. In this paper, we propose SUMM N , a simple, flexible, and effective multi-stage framework for input texts that are longer than the maximum context lengths of typical pretrained LMs. SUMM N first generates the coarse summary in mu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
6
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
3
1
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(6 citation statements)
references
References 33 publications
0
6
0
Order By: Relevance
“…Hierarchical models [ 23 ] iteratively merge paragraph-level dependencies. Segmentation-based approaches [ 24 , 25 , 26 ] with fusion-in-decoder [ 27 ] and marginalized decoding [ 28 ] divide the input into meaningful units to produce a summary. Extract-then-abstract procedures [ 29 ] pick a subset of relevant sentences from the source to generate the outline, eventually relying on marginalization [ 30 , 31 ].…”
Section: Related Workmentioning
confidence: 99%
“…Hierarchical models [ 23 ] iteratively merge paragraph-level dependencies. Segmentation-based approaches [ 24 , 25 , 26 ] with fusion-in-decoder [ 27 ] and marginalized decoding [ 28 ] divide the input into meaningful units to produce a summary. Extract-then-abstract procedures [ 29 ] pick a subset of relevant sentences from the source to generate the outline, eventually relying on marginalization [ 30 , 31 ].…”
Section: Related Workmentioning
confidence: 99%
“…While there do exist more powerful dialogue summarization models such as DialogLM [29] and Summ [28], we use the BART (Bidirectional and Auto-Regressive Transformers) model [12] due to its speed and high performance in long document summarization tasks [11]. In addition, there has been previous research in assessing different topic segmentation methods on the BART model, so this allows us to evaluate our techniques.…”
Section: Bart Model For Meeting Summarizationmentioning
confidence: 99%
“…The simplest approach involves truncating the lengthy input text into a shorter sequence within a predefined maximum length [9 , 10]. While this allows the use of off-the-shelf LLMs, it is heavily influenced by lead bias and can lead to significant information loss as the document length increases [11,12,13]. Another approach, text chunking, involves breaking down a long document into smaller, semantically similar segments and processing each segment independently before aggregation [8,14].…”
Section: Related Workmentioning
confidence: 99%