2023
DOI: 10.22541/au.167528156.61938925/v1
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Identifying Machine-Paraphrased Plagiarism

Abstract: Employing paraphrasing tools to conceal plagiarized text is a severe threat to academic integrity. To enable the detection of machine-paraphrased text, we evaluate the effectiveness of five pre-trained word embedding models combined with machine learning classifiers and state-of-the-art neural language models. We analyze preprints of research papers , graduation theses, and Wikipedia articles, which we paraphrased using different configurations of the tools SpinBot and SpinnerChief. The best performing techniq… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 7 publications
(4 citation statements)
references
References 47 publications
(89 reference statements)
0
4
0
Order By: Relevance
“…Transparency -Explainability. Being a multifaceted concept, the term "transparency" is both used to refer to technical explainability 47,65,77,120,124 as well as organizational openness 68,132,136,137 . Regarding the former, papers underscore the need for mechanistic interpretability 124 and for explaining internal mechanisms in generative models 65 .…”
Section: Governance -Regulationmentioning
confidence: 99%
“…Transparency -Explainability. Being a multifaceted concept, the term "transparency" is both used to refer to technical explainability 47,65,77,120,124 as well as organizational openness 68,132,136,137 . Regarding the former, papers underscore the need for mechanistic interpretability 124 and for explaining internal mechanisms in generative models 65 .…”
Section: Governance -Regulationmentioning
confidence: 99%
“…After preprocessing, it contains over 29,000,000 paragraphs. • Machine-Paraphrased Plagiarism: contains 200,767 paragraphs (50% of which are paraphrased using the SpinBot API) extracted from Wikipedia (English) articles [36]. This dataset will be used for the evaluation of representations generated by the proposed model.…”
Section: A Datasetsmentioning
confidence: 99%
“…the European Research Council), these would need to be included in future funding schemes, providing adequate support to ecologists. Transparent reporting of the responsible use of AI-generated content in scientific work including detailed (and ideally machine-actionable) provenance information must be part of such an open science approach (Wahle et al, 2023). This would also safeguard gap-filling techniques (Table 1) from misuse via fabricated data.…”
Section: Risk Mitigationmentioning
confidence: 99%