QMSum: A New Benchmark for Query-based Multi-domain Meeting Summarization

Zhong, Ming; Yin, Dong; Yu, Tao; Zaidi, Ahmad; Mutuma, Mutethia; Jha, Rahul; Awadallah, Ahmed Hassan; Çelikyılmaz, Aslı; Liu, Yang; Qiu, Xipeng; Radev, Dragomir

doi:10.48550/arxiv.2104.05938

Cited by 27 publications

(12 citation statements)

References 44 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Query-based Multi-domain Meeting Summarization (QMsum) is a very large conference abstract dataset, containing conferences spanning multiple domains. The cross-domain QMSum dataset was designed to improve the generalisation performance of the model and to provide a venue for evaluating the performance of the model across different domain conferences [8].…”

Section: Datamentioning

confidence: 99%

Query-Based Dialogue Summarization Using BART

2024

ACE

View full text Add to dashboard Cite

Conversation summarisation is the transformation of long conversational texts into concise and accurate summaries, the importance of which lies in improving the user experience and information filtering. As an important natural language processing task, conversation summarisation can provide concise and accurate information and avoid repetition and redundancy. In the dialogue summarisation task, pre-trained language models can be used to summarise long conversations and generate concise and accurate summaries. The aim of this paper is to investigate the possibility of using bidirectional and auto-regressive transformer models for dialogue summarisation tasks. In our experiments, we analysed the characteristics of the Query-based Multi-domain Meeting Summarization (QMsum) dialogue summarisation dataset, proposed a dialogue summarisation model based on the Bidirectional and Auto-Regressive Transformer model, and designed evaluation experiments to compare its performance with other methods in the dialogue summarisation task. The experimental results show that the results of this thesis are important for facilitating the development of dialogue summarisation tasks and the application of the Bidirectional and Auto-Regressive Transformer model.

show abstract

Section: Datamentioning

confidence: 99%

Query-Based Dialogue Summarization Using BART

2024

ACE

View full text Add to dashboard Cite

show abstract

“…Compared to previous news summarization datasets, it has significantly longer input and output. QMSum (Zhong et al 2021)is a benchmark for query-focused dialogue summarization. The dataset is composed of meeting records from three different domains.…”

Section: Experiments Datasetmentioning

confidence: 99%

“…We conducted experiments on two long-input summarization datasets: arXiv (Cohan et al 2018) for long-document summarization, and QMSum (Zhong et al 2021) for longdialogue summarization. Taking a recent proposed extractgenerate summarization model DYLE (Mao et al 2021) as the backbone, our approach achieves improvement on both datasets compared to the base model and obtains the stateof-the-art result on QMSum.…”

Section: Introductionmentioning

confidence: 99%

Preserve Context Information for Extract-Generate Long-Input Summarization Framework

Yuan

Wang

Cao

et al. 2023

AAAI

View full text Add to dashboard Cite

The Extract-generate framework has been a classic approach for text summarization. As pretrained language models struggling with long-input summarization for their high memory cost, extract-generate framework regains researchers' interests. However, the cost of its effectiveness in dealing with long-input summarization is the loss of context information. In this paper, we present a context-aware extract-generate framework (CAEG) for long-input text summarization. It focuses on preserving both local and global context information in an extract-generate framework with little cost, and can be applied to most of existing extract-generate summarization models. CAEG generates a set of context-related text spans called context prompts for each text snippet and use them to transfer the context information from the extractor and generator. To find such context prompts, we propose to capture the context information based on the interpretation of the extractor, where the text spans having the highest contribution to the extraction decision are considered as containing the richest context information. We evaluate our approach on both long-document and long-dialogue summarization datasets: arXiv and QMSum. The experiment results show that CAEG achieves the-state-of-art result on QMSum and outperforms other extract-generate based models in arXiv.

show abstract

“…Moreover, extracting user attributes from an open-domain conversations [Wu et al, 2019], getting to know the user through conversations, can be marked as one of the potential applications. The very recent proposed query-based meeting summarization dataset, QMSum [Zhong et al, 2021], can be viewed as one application of treating conversations as database and conduct an abstractive question answering task.…”

Section: Related Workmentioning

confidence: 99%

QAConv: Question Answering on Informative Conversations

Wu¹,

Madotto²,

Liu³

et al. 2021

Preprint

View full text Add to dashboard Cite

This paper introduces QAConv 1 , a new question answering (QA) dataset that uses conversations as a knowledge source. We focus on informative conversations including business emails, panel discussions, and work channels. Unlike opendomain and task-oriented dialogues, these conversations are usually long, complex, asynchronous, and involve strong domain knowledge. In total, we collect 34,204 QA pairs, including span-based, free-form, and unanswerable questions, from 10,259 selected conversations with both human-written and machine-generated questions. We segment long conversations into chunks, and use a question generator and dialogue summarizer as auxiliary tools to collect multi-hop questions. The dataset has two testing scenarios, chunk mode and full mode, depending on whether the grounded chunk is provided or retrieved from a large conversational pool. Experimental results show that state-of-the-art QA systems trained on existing QA datasets have limited zero-shot ability and tend to predict our questions as unanswerable. Fine-tuning such systems on our corpus can achieve significant improvement up to 23.6% and 13.6% in both chunk mode and full mode, respectively.

show abstract

QMSum: A New Benchmark for Query-based Multi-domain Meeting Summarization

Cited by 27 publications

References 44 publications

Query-Based Dialogue Summarization Using BART

Query-Based Dialogue Summarization Using BART

Preserve Context Information for Extract-Generate Long-Input Summarization Framework

QAConv: Question Answering on Informative Conversations

Contact Info

Product

Resources

About