2023
DOI: 10.48550/arxiv.2303.07205
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

The Science of Detecting LLM-Generated Texts

Abstract: The emergence of large language models (LLMs) has resulted in the production of LLM-generated texts that is highly sophisticated and almost indistinguishable from texts written by humans. However, this has also sparked concerns about the potential misuse of such texts, such as spreading misinformation and causing disruptions in the education system. Although many detection approaches have been proposed, a comprehensive understanding of the achievements and challenges is still lacking. This survey aims to provi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
17
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
2
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 19 publications
(17 citation statements)
references
References 28 publications
0
17
0
Order By: Relevance
“…Some common choices are based on the perplexity of the given text (Dhaini et al, 2023; Ghosal et al, 2023; Tang et al, 2023). The general idea behind this approach is that texts generated by LLMs tend to have lower perplexities.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…Some common choices are based on the perplexity of the given text (Dhaini et al, 2023; Ghosal et al, 2023; Tang et al, 2023). The general idea behind this approach is that texts generated by LLMs tend to have lower perplexities.…”
Section: Resultsmentioning
confidence: 99%
“…Detectors that use hidden statistical patterns have become advantageous in this context, as they require no knowledge of the specific LLMs used and little to no training at all. Some common choices are based on the perplexity of the given text (Dhaini et al, 2023;Ghosal et al, 2023;Tang et al, 2023). The general idea behind this approach is that texts generated by LLMs tend to have lower perplexities.…”
Section: Binoculars Scores Before and After The Release Of Chatgptmentioning
confidence: 99%
“…The susceptibility to adversarial attacks [242], ethical concerns [75], difficulty with context-dependent language [249], absence of emotion and sentiment analysis [513], limited multilingual capabilities [514], limited memory [515], lack of creativity [413], and restricted real-time capabilities [489] are also critical concerns. The high costs of training and maintenance, limited scalability, lack of causality, inadequate ability to handle multimodal inputs, limited attention span, limited transfer learning capabilities, insufficient understanding of the world beyond text, inadequate comprehension of human behavior and psychology, limited ability to generate long-form text, restricted collaboration capabilities, limited ability to handle ambiguity, inadequate understanding of cultural differences, limited ability to learn incrementally, limited ability to handle structured data, and limited ability to handle noise or errors in input data [516], [517], [518], [519], [520], [521], [92] are some of the key challenges in safe, responsible, and efficient deployment of LLMs.…”
Section: Challenges and Limitations Of Large Language Modelsmentioning
confidence: 99%
“…On a similar vein, Tornede et al [61] explore LLMs in the context of Automated Machine Learning (AutoML) techniques, discussing existing methodologies and the challenges of using them to enhance LLM performance. Tang et al [62] focus on techniques for detecting text generated by LLMs, while Chang et al [29] have examined the various ways to evaluate LLMs. Additionally, there are a number of surveys dedicated to investigating the specialised applications of Large Models in various fields such as vision [23,24,32,33], education [34][35][36][37]63], healthcare [38,39], computational biology [42,43], computer programming [64,65], law [44][45][46]66], or robotics [47,67,68] among others.…”
Section: Speechmentioning
confidence: 99%