2020
DOI: 10.1007/978-3-030-60450-9_42
|View full text |Cite
|
Sign up to set email alerts
|

CLTS: A New Chinese Long Text Summarization Dataset

Abstract: The abstractive methods lack of creative ability is particularly a problem in automatic text summarization. The summaries generated by models are mostly extracted from the source articles. One of the main causes for this problem is the lack of dataset with abstractiveness, especially for Chinese. In order to solve this problem, we paraphrase the reference summaries in CLTS, the Chinese Long Text Summarization dataset, correct errors of factual inconsistencies, and propose the first Chinese Long Text Summarizat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 11 publications
(3 citation statements)
references
References 30 publications
0
3
0
Order By: Relevance
“…TTNews [9] is provided for NLPCC Single Document Summarization competition 2 , including 50,000 training examples with summaries and 50,000 without summaries. CLTS [10] is a Chinese summarization dataset extracted from the news website ThePaper. It contains more than 180,000 long articles and summaries written by editors of the website.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…TTNews [9] is provided for NLPCC Single Document Summarization competition 2 , including 50,000 training examples with summaries and 50,000 without summaries. CLTS [10] is a Chinese summarization dataset extracted from the news website ThePaper. It contains more than 180,000 long articles and summaries written by editors of the website.…”
Section: Related Workmentioning
confidence: 99%
“…Currently, most Chinese summarization datasets are collected from Chinese social media Weibo, which are limited to a 140-character length [7,8]. Some other datasets are scraped from news websites, such as Toutiao [9] and ThePaper [10]. However, those datasets are either small-scale or of low quality.…”
Section: Introductionmentioning
confidence: 99%
“…In order to evaluate our algorithm more comprehensively, we made experiments on four different datasets, CNN/DailyMail [6] , NYT [14] , TTNews [7] and CLTS [8] . Both the Chinese and English datasets are composed of relatively shorter articles datasets and relatively longer articles datasets.…”
Section: Datasetsmentioning
confidence: 99%