Findings of the Association for Computational Linguistics: EMNLP 2020 2020
DOI: 10.18653/v1/2020.findings-emnlp.360
|View full text |Cite
|
Sign up to set email alerts
|

WikiLingua: A New Benchmark Dataset for Cross-Lingual Abstractive Summarization

Abstract: We introduce WikiLingua, a large-scale, multilingual dataset for the evaluation of crosslingual abstractive summarization systems. We extract article and summary pairs in 18 languages from WikiHow 12 , a high quality, collaborative resource of how-to guides on a diverse set of topics written by human authors. We create gold-standard articlesummary alignments across languages by aligning the images that are used to describe each how-to step in an article. As a set of baselines for further studies, we evaluate t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

3
103
0
1

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 105 publications
(107 citation statements)
references
References 32 publications
3
103
0
1
Order By: Relevance
“…The human evaluation results correlate with automated evaluation as shown in Tables 1 and 2. Ladhak et al (2020) reported cross-lingual ATS score with same data for four different languages. The R-L score for four languages are 34.…”
Section: Abstractive Text Summarization (Ats)mentioning
confidence: 97%
See 1 more Smart Citation
“…The human evaluation results correlate with automated evaluation as shown in Tables 1 and 2. Ladhak et al (2020) reported cross-lingual ATS score with same data for four different languages. The R-L score for four languages are 34.…”
Section: Abstractive Text Summarization (Ats)mentioning
confidence: 97%
“…In Abstractive Text Summarization (ATS), we aim to generate grammatically coherent, semantically correct and abstractive summary given an input document. We use recently released WikiLingua (Ladhak et al, 2020) cross-lingual abstractive summarization dataset containing data in 18 languages. Prior splits are not available for this dataset.…”
Section: Abstractive Text Summarization (Ats)mentioning
confidence: 99%
“…The summarization landscape can be roughly divided into three primary summary-forms: (1) Single sentence (Napoles et al, 2012;Grusky et al, 2018;Narayan et al, 2018;Kim et al, 2019) -summarize the document in a single sentence; (2) Highlights (Hermann et al, 2015;Koupaee and Wang, 2018;Ladhak et al, 2020) -a summary in the form of bullets listing the key points in the text; (3) Coherent summary (Sharma et al, 2019;Cohan et al, 2018) -short coherent paragraphs describing the salient information. The summarization datasets from the news domain, which are commonly used for human evaluation, include summaries in the form of highlights or single-sentence summaries.…”
Section: Related Workmentioning
confidence: 99%
“…PubMed, arXiv, and BigPatent (Cohan et al, 2018;Sharma et al, 2019) provide a summary in the form of coherent paragraphs (i.e., each sentence flows smoothly into the next). In contrast, other summarization datasets (Hermann et al, 2015;Grusky et al, 2018;Koupaee and Wang, 2018;Ladhak et al, 2020) offer a summary in the form of a key points list (i.e., highlights). In this paper, we focus on coherent paragraph summarization datasets.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation