2021
DOI: 10.7717/peerj-cs.443
|View full text |Cite
|
Sign up to set email alerts
|

Feature-based detection of automated language models: tackling GPT-2, GPT-3 and Grover

Abstract: The recent improvements of language models have drawn much attention to potential cases of use and abuse of automatically generated text. Great effort is put into the development of methods to detect machine generations among human-written text in order to avoid scenarios in which the large-scale generation of text with minimal cost and effort undermines the trust in human interaction and factual information online. While most of the current approaches rely on the availability of expensive language models, we … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
32
0
1

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 54 publications
(33 citation statements)
references
References 23 publications
0
32
0
1
Order By: Relevance
“…Research has shown that Chat-GPT expresses significantly less negative emotion and hate speech compared to human-authored texts. Stylistic features or stylometry, including repetitiveness, lack of purpose, and readability, are also known to harbor valuable signals for detecting LLM-generated texts [13]. In addition to analyzing single texts, numerous linguistic patterns can be found in multi-turn conversations [3].…”
Section: Statistical Disparitiesmentioning
confidence: 99%
See 1 more Smart Citation
“…Research has shown that Chat-GPT expresses significantly less negative emotion and hate speech compared to human-authored texts. Stylistic features or stylometry, including repetitiveness, lack of purpose, and readability, are also known to harbor valuable signals for detecting LLM-generated texts [13]. In addition to analyzing single texts, numerous linguistic patterns can be found in multi-turn conversations [3].…”
Section: Statistical Disparitiesmentioning
confidence: 99%
“…Some of the commonly used algorithms are Support Vector Machines, Naive Bayes, and Decision Trees. For instance, Fröhling et al utilized linear regression, SVM, and random forests models built on statistical and linguistic features to successfully identify texts generated by GPT-2, GPT-3, and Grover models [13]. Similarly, Solaiman et al achieved solid performance in identifying texts generated by GPT-2 through a combination of TF-IDF unigram and bigram features with a logistic regression model [35].…”
Section: Traditional Classification Algorithms Traditional Classifica...mentioning
confidence: 99%
“…ChatGPT and other language generation models based on deep learning techniques, such as GPT-3, can be used for various natural language processing tasks, including medical writing. However, it is essential to note that using AI-generated text in the medical field requires careful consideration and review by medical experts to ensure the accuracy and reliability of the generated text [ 13 ].…”
Section: Editorialmentioning
confidence: 99%
“…Furthermore, as this approach was only applied to text produced by a particular version of Google Translate, these methods have been untested against current state-of-art text generation networks. Recent work has used a feature-based approach to detection and characterization of GPT-2, GPT-3 and Grover datasets using a variety of text features, but intentionally avoids modern neural language models in the analysis, and does not consider adversarial robustness [20].…”
Section: B Detection Of Computer-generated Textmentioning
confidence: 99%