Findings of the Association for Computational Linguistics: EMNLP 2021 2021
DOI: 10.18653/v1/2021.findings-emnlp.22
|View full text |Cite
|
Sign up to set email alerts
|

An Unsupervised Method for Building Sentence Simplification Corpora in Multiple Languages

Abstract: The availability of parallel sentence simplification (SS) is scarce for neural SS modelings. We propose an unsupervised method to build SS corpora from large-scale bilingual translation corpora, alleviating the need for SS supervised corpora. Our method is motivated by the following two findings: neural machine translation model usually tends to generate more high-frequency tokens and the difference of text complexity levels exists between the source and target language of a translation corpus. By taking the p… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
19
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 11 publications
(19 citation statements)
references
References 40 publications
0
19
0
Order By: Relevance
“…In this way, Martin et al (2022) employed sentence embedding modeling to measure the similarity between sentences from approximately one billion CC-NET (Wenzek et al, 2020) sentences, and subsequently constructed a new parallel SS corpora. Lu et al (2021) utilize machine translation corpus to consturct SS corpora via bach-translation technique. Compared with WikiLarge, these corpora can help to enhance the performance of supervised SS methods.…”
Section: Llmmentioning
confidence: 99%
See 2 more Smart Citations
“…In this way, Martin et al (2022) employed sentence embedding modeling to measure the similarity between sentences from approximately one billion CC-NET (Wenzek et al, 2020) sentences, and subsequently constructed a new parallel SS corpora. Lu et al (2021) utilize machine translation corpus to consturct SS corpora via bach-translation technique. Compared with WikiLarge, these corpora can help to enhance the performance of supervised SS methods.…”
Section: Llmmentioning
confidence: 99%
“…In the experiments, we use standard simplification evaluation package EASSE 2 to calculate the SARI and FKGL metrics. We calculate FRES by using the script from Lu et al (2021) 3 .…”
Section: Evaluation Settingsmentioning
confidence: 99%
See 1 more Smart Citation
“…Martin et al (2020b) proposed unsupervised mining technology to create multi-language simplification corpora automatically. Lu et al (2021) used the back-translation approach to construct a large-scale pseudo sentence simplification corpus.…”
Section: Mine Data For Simplificationmentioning
confidence: 99%
“…We can treat CIP task as a special paraphrase generation task. The general paraphrase generation task aims to rephrase a given sentence to another one that possesses identical semantics but various lexicons or syntax [6], [7]. Similarly, CIP emphasizes rephrasing the idioms of input sentences to word segments that reflect more intuitive and understandable paraphrasing.…”
Section: Tablementioning
confidence: 99%