Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense

Krishna, Kalpesh; Song, Yining; Karpinska, Marzena; Wieting, John; Iyyer, Mohit

doi:10.48550/arxiv.2303.13408

Cited by 24 publications

(31 citation statements)

References 48 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The results showed that the existing detectors are not robust to the attacks, which emphasizes the need for more robust and reliable detectors to detect and avoid the misuse of LLMs. Krishna et al [452] showed that existing detectors like OpenAI detector, GPTZero and DetectGPT [463] are not robust to paraphrase attacks. For example, paraphrase attacks result in a drop of more than 65% accuracy in the case of DetectGPT.…”

Section: Detecting Gllm Generated Textmentioning

confidence: 99%

“…The performance of existing approaches like DetectGPT, Ze-roGPT, OpenAI detector, ChatGPT-detector-roberta and ChatGPT-qa-detector-roberta is not satisfactory [437], [444]. Moreover, the existing approaches are not robust to various attacks like paraphrasing, synonym word replacement and writing style modification [445], [452]. So, there is a great need for better approaches which can reliably detect GLLM generated text and also robust to various attacks, including paraphrasing.…”

Section: Robust Approaches To Detect Gllm Generated Textmentioning

confidence: 99%

See 1 more Smart Citation

A Survey of GPT-3 Family Large Language Models Including ChatGPT and GPT-4

Kalyan

2023

SSRN Journal

View full text Add to dashboard Cite

Section: Detecting Gllm Generated Textmentioning

confidence: 99%

Section: Robust Approaches To Detect Gllm Generated Textmentioning

confidence: 99%

A Survey of GPT-3 Family Large Language Models Including ChatGPT and GPT-4

Kalyan

2023

SSRN Journal

View full text Add to dashboard Cite

“…Unfortunately, existing detectors often perform poorly against simple attacks (e.g., paraphrasing), as highlighted by recent studies (Sadasivan et al 2023;Krishna et al 2023). A recent survey called for developing robust detection methods against other potential attacks designed to deceive the detectors (Tang, Chuang, and Hu 2023).…”

Section: Introductionmentioning

confidence: 99%

OUTFOX: LLM-Generated Essay Detection Through In-Context Learning with Adversarially Generated Examples

Koike,

Kaneko,

Okazaki

2024

AAAI

View full text Add to dashboard Cite

Large Language Models (LLMs) have achieved human-level fluency in text generation, making it difficult to distinguish between human-written and LLM-generated texts. This poses a growing risk of misuse of LLMs and demands the development of detectors to identify LLM-generated texts. However, existing detectors lack robustness against attacks: they degrade detection accuracy by simply paraphrasing LLM-generated texts. Furthermore, a malicious user might attempt to deliberately evade the detectors based on detection results, but this has not been assumed in previous studies. In this paper, we propose OUTFOX, a framework that improves the robustness of LLM-generated-text detectors by allowing both the detector and the attacker to consider each other's output. In this framework, the attacker uses the detector's prediction labels as examples for in-context learning and adversarially generates essays that are harder to detect, while the detector uses the adversarially generated essays as examples for in-context learning to learn to detect essays from a strong attacker. Experiments in the domain of student essays show that the proposed detector improves the detection performance on the attacker-generated texts by up to +41.3 points F1-score. Furthermore, the proposed detector shows a state-of-the-art detection performance: up to 96.9 points F1-score, beating existing detectors on non-attacked texts. Finally, the proposed attacker drastically degrades the performance of detectors by up to -57.0 points F1-score, massively outperforming the baseline paraphrasing method for evading detection.

show abstract

“…Tools have been developed and are being continually refined to counteract the threat posed by AI-generated text to the integrity of assignments by assessing a block of text as being of human vs AI authorship [50,51]. Additional techniques to detect attempts at evading detection are also being examined [52,53].…”

Section: Introductionmentioning

confidence: 99%

Exploring Ethical Boundaries: Can ChatGPT Be Prompted to Give Advice on How to Cheat in University Assignments?

Spennemann

2023

Preprint

View full text Add to dashboard Cite

Generative artificial intelligence (AI), in particular large language models such as ChatGPT have reached public consciousness with a wide-ranging discussion of their capabilities and suitability for various professions. The extant literature on the ethics of generative AI revolves around its usage and application, rather than the ethical framework of the responses provided. In the education sector, concerns have been raised with regard to the ability of these language models to aid in student assignment writing with the potentially concomitant student misconduct of such work is submitted for assessment. Based on a series of ‘conversations’ with multiple replicates, using a range of discussion prompts, this paper examines the capability of ChatGPT to provide advice on how to cheat in assessments. Since its public release in November 2022, numerous authors have developed ‘jailbreaking’ techniques to trick ChatGPT into answering questions in ways other than the default mode. While the default mode activates a safety awareness mechanism that prevents ChatGPT from providing unethical advice, other modes partially or fully bypass the this mechanism and elicit answers that are outside expected ethical boundaries. ChatGPT provided a wide range of suggestions on how to best cheat in university assignments, with some solutions common to most replicates (‘plausible deniability,’ language adjustment of contract written text’). Some of ChatGPT’s solutions to avoid cheating being detected were cunning, if not slightly devious. The implications of these findings are discussed.

show abstract

Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense

Cited by 24 publications

References 48 publications

A Survey of GPT-3 Family Large Language Models Including ChatGPT and GPT-4

A Survey of GPT-3 Family Large Language Models Including ChatGPT and GPT-4

OUTFOX: LLM-Generated Essay Detection Through In-Context Learning with Adversarially Generated Examples

Exploring Ethical Boundaries: Can ChatGPT Be Prompted to Give Advice on How to Cheat in University Assignments?

Contact Info

Product

Resources

About