Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2021
DOI: 10.18653/v1/2021.emnlp-main.500
|View full text |Cite
|
Sign up to set email alerts
|

BiSECT: Learning to Split and Rephrase Sentences with Bitexts

Abstract: An important task in NLP applications such as sentence simplification is the ability to take a long, complex sentence and split it into shorter sentences, rephrasing as necessary. We introduce a novel dataset and a new model for this 'split and rephrase' task. Our BISECT training data consists of 1 million long English sentences paired with shorter, meaning-equivalent English sentences. We obtain these by extracting 1-2 sentence alignments in bilingual parallel corpora and then using machine translation to con… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
5
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 7 publications
(5 citation statements)
references
References 27 publications
0
5
0
Order By: Relevance
“…While there have been concerted efforts in the past in the NLP community to develop metrics and corpora purported to serve studies in simplification (Xu et al, 2015 ; Zhang and Lapata, 2017 ; Narayan et al, 2017 ; Botha et al, 2018 ; Sulem et al, 2018a ; Niklaus et al, 2019 ; Kim et al, 2021 ), they fell far short of addressing how their study contributes to improving the text comprehensibility. 3 A part of our goal is to break away from a prevailing view that relegates readability to a sideline.…”
Section: Related Workmentioning
confidence: 99%
“…While there have been concerted efforts in the past in the NLP community to develop metrics and corpora purported to serve studies in simplification (Xu et al, 2015 ; Zhang and Lapata, 2017 ; Narayan et al, 2017 ; Botha et al, 2018 ; Sulem et al, 2018a ; Niklaus et al, 2019 ; Kim et al, 2021 ), they fell far short of addressing how their study contributes to improving the text comprehensibility. 3 A part of our goal is to break away from a prevailing view that relegates readability to a sideline.…”
Section: Related Workmentioning
confidence: 99%
“…Although the CRF model handles unaligned sentences by default, the resulting data remains noisy. We therefore adapt the method described in Kim et al (2021), initially used in the context of sentence-splitting, to further filter misalignments in MC-NOISY. Ultimately, this method was applied to the entire training set, which includes the entirety of MC-NOISY as well as the portion of MC-CLEAN not used for testing or validation.…”
Section: Filteringmentioning
confidence: 99%
“…While there have been concerted efforts in the past in the NLP community to develop metrics and corpora purported to serve studies in simplification (Zhang and Lapata, 2017;Sulem et al, 2018a;Botha et al, 2018;Niklaus et al, 2019;Kim et al, 2021;Xu et al, 2015), they fell far short of addressing how their work contributes to improving the text comprehensibility by readers. Part of our goal is to break away from a prevailing view that relegates the readability to a sideline.…”
Section: Related Workmentioning
confidence: 99%