Findings of the Association for Computational Linguistics: EMNLP 2023 2023
DOI: 10.18653/v1/2023.findings-emnlp.700
|View full text |Cite
|
Sign up to set email alerts
|

InstructSafety: A Unified Framework for Building Multidimensional and Explainable Safety Detector through Instruction Tuning

Zhexin Zhang,
Jiale Cheng,
Hao Sun
et al.

Abstract: Safety detection has been an increasingly important topic in recent years and it has become even more necessary to develop reliable safety detection systems with the rapid development of large language models. However, currently available safety detection systems have limitations in terms of their versatility and interpretability. In this paper, we first introduce INSTRUCTSAFETY, a safety detection framework that unifies 7 common sub-tasks for safety detection. These tasks are unified into a similar form throu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
references
References 35 publications
0
0
0
Order By: Relevance