Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics 2020
DOI: 10.18653/v1/2020.acl-main.120
|View full text |Cite
|
Sign up to set email alerts
|

A Large-Scale Multi-Document Summarization Dataset from the Wikipedia Current Events Portal

Abstract: Multi-document summarization (MDS) aims to compress the content in large document collections into short summaries and has important applications in story clustering for newsfeeds, presentation of search results, and timeline generation. However, there is a lack of datasets that realistically address such use cases at a scale large enough for training supervised models for this task. This work presents a new dataset for MDS that is large both in the total number of document clusters and in the size of individu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
41
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
5

Relationship

0
10

Authors

Journals

citations
Cited by 56 publications
(41 citation statements)
references
References 12 publications
0
41
0
Order By: Relevance
“…Wikipedia has also been used to construct datasets for other text generation tasks, such as generating Wikipedia movie plots (Orbach and Goldberg, 2020;Rashkin et al, 2020) and short Wikipedia event summaries (Gholipour Ghalandari et al, 2020), and summarizing Wikipedia documents (Zopf, 2018; or summaries of aspects of interests (Hayashi et al, 2020) from relevant documents.…”
Section: Related Workmentioning
confidence: 99%
“…Wikipedia has also been used to construct datasets for other text generation tasks, such as generating Wikipedia movie plots (Orbach and Goldberg, 2020;Rashkin et al, 2020) and short Wikipedia event summaries (Gholipour Ghalandari et al, 2020), and summarizing Wikipedia documents (Zopf, 2018; or summaries of aspects of interests (Hayashi et al, 2020) from relevant documents.…”
Section: Related Workmentioning
confidence: 99%
“…A small number of MDS datasets are available for other domains, including MultiNews (Fabbri et al, 2019), WikiSum , and Wikipedia Current Events (Gholipour Ghalandari et al, 2020). Most similar to MSˆ2 is MultiNews, where multiple news articles about the same event are summarized into one short paragraph.…”
Section: Related Workmentioning
confidence: 99%
“…It is also noted that several large-scale MDS datasets have been introduced in the news domain (Fabbri et al, 2019;Gu et al, 2020;Gholipour Ghalandari et al, 2020), for creating Wikipedia leadparagraphs (Liu et al, 2018), and for long-form question answering . However, these do not focus on the conversational domain.…”
Section: Dataset Statisticsmentioning
confidence: 99%