2020
DOI: 10.48550/arxiv.2006.09719
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Automatically Ranked Russian Paraphrase Corpus for Text Generation

Abstract: The article is focused on automatic development and ranking of a large corpus for Russian paraphrase generation which proves to be the first corpus of such type in Russian computational linguistics. Existing manually annotated paraphrase datasets for Russian are limited to small-sized ParaPhraser corpus and ParaPlag which are suitable for a set of NLP tasks, such as paraphrase and plagiarism detection, sentence similarity and relatedness estimation, etc. Due to size restrictions, these datasets can hardly be a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 13 publications
0
2
0
Order By: Relevance
“…We evaluate the golden references (first answer) from the fixed RuSimpleSentEval-2021 test sets (public/private); • Paraphraser. We use a paraphrase model 6 trained on 7000 examples from different sources of various domains: 1) text level: literature domain, prose; back translation (with ru-en translation model 7 ) of the texts from different domains filtered with Bertscore Rouge-L); 2) sentence level: Russian version of Tapaco corpus (Scherrer, 2020) and filtered ParaphraserPlus (Gudkov et al, 2020) corpus. • Fine-tuned paraphraser.…”
Section: Modelsmentioning
confidence: 99%
“…We evaluate the golden references (first answer) from the fixed RuSimpleSentEval-2021 test sets (public/private); • Paraphraser. We use a paraphrase model 6 trained on 7000 examples from different sources of various domains: 1) text level: literature domain, prose; back translation (with ru-en translation model 7 ) of the texts from different domains filtered with Bertscore Rouge-L); 2) sentence level: Russian version of Tapaco corpus (Scherrer, 2020) and filtered ParaphraserPlus (Gudkov et al, 2020) corpus. • Fine-tuned paraphraser.…”
Section: Modelsmentioning
confidence: 99%
“…The paraphraser 3 was chosen as it's a free model that is provided as an API. The model was trained on 7000 examples from different sources of various domains: 1) text level -texts from different domains filtered with Bertscore (Zhang et al, 2019) and Rouge-L) 2) sentence level -the Russian version of Tapaco corpus (Scherrer, 2020) and filtered Para-phraserPlus (Gudkov et al, 2020) corpus. Russian news dataset for summarization 4 was used as the source data for models generation.…”
Section: Data Collectionmentioning
confidence: 99%