2023
DOI: 10.48550/arxiv.2303.13408
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense

Abstract: To detect the deployment of large language models for malicious use cases (e.g., fake content creation or academic plagiarism), several approaches have recently been proposed for identifying AI-generated text via watermarks or statistical irregularities. How robust are these detection algorithms to paraphrases of AI-generated text? To stress test these detectors, we first train an 11B parameter paraphrase generation model (DIPPER) that can paraphrase paragraphs, optionally leveraging surrounding text (e.g., us… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
30
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
2
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 24 publications
(31 citation statements)
references
References 48 publications
1
30
0
Order By: Relevance
“…The results showed that the existing detectors are not robust to the attacks, which emphasizes the need for more robust and reliable detectors to detect and avoid the misuse of LLMs. Krishna et al [452] showed that existing detectors like OpenAI detector, GPTZero and DetectGPT [463] are not robust to paraphrase attacks. For example, paraphrase attacks result in a drop of more than 65% accuracy in the case of DetectGPT.…”
Section: Detecting Gllm Generated Textmentioning
confidence: 99%
See 1 more Smart Citation
“…The results showed that the existing detectors are not robust to the attacks, which emphasizes the need for more robust and reliable detectors to detect and avoid the misuse of LLMs. Krishna et al [452] showed that existing detectors like OpenAI detector, GPTZero and DetectGPT [463] are not robust to paraphrase attacks. For example, paraphrase attacks result in a drop of more than 65% accuracy in the case of DetectGPT.…”
Section: Detecting Gllm Generated Textmentioning
confidence: 99%
“…The performance of existing approaches like DetectGPT, Ze-roGPT, OpenAI detector, ChatGPT-detector-roberta and ChatGPT-qa-detector-roberta is not satisfactory [437], [444]. Moreover, the existing approaches are not robust to various attacks like paraphrasing, synonym word replacement and writing style modification [445], [452]. So, there is a great need for better approaches which can reliably detect GLLM generated text and also robust to various attacks, including paraphrasing.…”
Section: Robust Approaches To Detect Gllm Generated Textmentioning
confidence: 99%
“…Unfortunately, existing detectors often perform poorly against simple attacks (e.g., paraphrasing), as highlighted by recent studies (Sadasivan et al 2023;Krishna et al 2023). A recent survey called for developing robust detection methods against other potential attacks designed to deceive the detectors (Tang, Chuang, and Hu 2023).…”
Section: Introductionmentioning
confidence: 99%
“…Tools have been developed and are being continually refined to counteract the threat posed by AI-generated text to the integrity of assignments by assessing a block of text as being of human vs AI authorship [50,51]. Additional techniques to detect attempts at evading detection are also being examined [52,53].…”
Section: Introductionmentioning
confidence: 99%