Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence 2023
DOI: 10.24963/ijcai.2023/694
|View full text |Cite
|
Sign up to set email alerts
|

Evaluating GPT-3 Generated Explanations for Hateful Content Moderation

Abstract: Recent research has focused on using large language models (LLMs) to generate explanations for hate speech through fine-tuning or prompting. Despite the growing interest in this area, these generated explanations' effectiveness and potential limitations remain poorly understood. A key concern is that these explanations, generated by LLMs, may lead to erroneous judgments about the nature of flagged content by both users and content moderators. For instance, an LLM-generated explanation might inaccurately convin… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 9 publications
(2 citation statements)
references
References 0 publications
0
2
0
Order By: Relevance
“…According to the study, few-shot learning models may achieve up to 80% accuracy, but zero-and one-shot learning models only reached an accuracy range of 45%-72%. The results underscored the significant role that large language models play in detecting hate and toxic speech, advocating for further development to counter toxic content and foster a safer online environment (Wang et al, 2023). Large language models (LLMs) have been used extensively in recent work to produce explanations for hate speech; this is typically accomplished by techniques such as prompting or fine-tuning.…”
Section: Deep Learning Techniquesmentioning
confidence: 99%
“…According to the study, few-shot learning models may achieve up to 80% accuracy, but zero-and one-shot learning models only reached an accuracy range of 45%-72%. The results underscored the significant role that large language models play in detecting hate and toxic speech, advocating for further development to counter toxic content and foster a safer online environment (Wang et al, 2023). Large language models (LLMs) have been used extensively in recent work to produce explanations for hate speech; this is typically accomplished by techniques such as prompting or fine-tuning.…”
Section: Deep Learning Techniquesmentioning
confidence: 99%
“…Also, some works reveal that ChatGPT evaluation produces results similar to expert human evaluation [8,13]. Therefore, we engage ChatGPT to evaluate the quality of explanations based on four metrics which have been widely employed in human evaluation [36,40]: misleadingness, informativeness, soundness, and readability. A 5-point Likert scale was employed, where 1 represented the poorest and 5 the best in addition to misleadingness.…”
Section: Evaluations On Explanationmentioning
confidence: 99%