Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing 2016
DOI: 10.18653/v1/d16-1034
|View full text |Cite
|
Sign up to set email alerts
|

PaCCSS-IT: A Parallel Corpus of Complex-Simple Sentences for Automatic Text Simplification

Abstract: In this paper we present PaCCSS-IT, a Parallel Corpus of Complex-Simple Sentences for ITalian. To build the resource we develop a new method for automatically acquiring a corpus of complex-simple paired sentences able to intercept structural transformations and particularly suitable for text simplification. The method requires a wide amount of texts that can be easily extracted from the web making it suitable also for less-resourced languages. We test it on the Italian language making available the biggest Ita… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
18
0
1

Year Published

2018
2018
2023
2023

Publication Types

Select...
6
3
1

Relationship

2
8

Authors

Journals

citations
Cited by 23 publications
(19 citation statements)
references
References 16 publications
0
18
0
1
Order By: Relevance
“…In terms of sentence aligned corpora for text simplification, different versions of aligned Wiki-Simple Wikipedia sentences have been used in NLP research (Zhu et al, 2010;Coster and Kauchak, 2011;Hwang et al, 2015). Different supervised and unsupervised approaches were proposed to construct such corpora (Bott and Saggion, 2011;Klerke and Søgaard, 2012;Klaper et al, 2013;Brunato et al, 2016). Our corpus adds a new resource for the English text simplification task.…”
Section: Introductionmentioning
confidence: 99%
“…In terms of sentence aligned corpora for text simplification, different versions of aligned Wiki-Simple Wikipedia sentences have been used in NLP research (Zhu et al, 2010;Coster and Kauchak, 2011;Hwang et al, 2015). Different supervised and unsupervised approaches were proposed to construct such corpora (Bott and Saggion, 2011;Klerke and Søgaard, 2012;Klaper et al, 2013;Brunato et al, 2016). Our corpus adds a new resource for the English text simplification task.…”
Section: Introductionmentioning
confidence: 99%
“…and it is thus more suitable to catch the "layman" intuition of sentence complexity. For these reasons, this method has been used in recent works in the field of readability and text simplification; it is the case of Lasecki et al (2015); Clercq et al (2013); Brunato et al (2016) where the crowd was asked to evaluate the level of complexity or the degree of informativeness of simplified sentences compared to the original one.…”
Section: Introductionmentioning
confidence: 99%
“…• A subset of the PaCCSS-it corpus (Brunato et al, 2016), which contains 63, 000 complex-to-simple sentence pairs automatically extracted from the Web. In order to extract only the pairs of higher quality, we pre-processed the corpus by discarding sentence pairs with special characters, misspellings, non-matching numerals or dates, and a cosine similarity below 0.5. mal language, including Italian Opensubtitles, 2 the Paisà corpus (Lyding et al, 2014), Wikipedia and the collection of Italian laws.…”
Section: Italianmentioning
confidence: 99%