2022
DOI: 10.48550/arxiv.2212.08073
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Constitutional AI: Harmlessness from AI Feedback

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
80
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
3
2

Relationship

0
10

Authors

Journals

citations
Cited by 77 publications
(80 citation statements)
references
References 0 publications
0
80
0
Order By: Relevance
“…New Directions Concurrent and future work is beginning to explore two new directions: (a) expanding task diversity even more aggressively with synthetic data generation, particularly in creative, and open-ended dialogue (Wang et al, 2022b;Honovich et al, 2022;Ye et al, 2022;, and (b) offering human feedback signals on model responses (Ouyang et al, 2022;Glaese et al, 2022;Bai et al, 2022a;Bai et al, 2022b). We view most of these new directions as likely additive to a foundation of instruction tuning methods.…”
Section: Public Instruction Tuning Collectionsmentioning
confidence: 99%
“…New Directions Concurrent and future work is beginning to explore two new directions: (a) expanding task diversity even more aggressively with synthetic data generation, particularly in creative, and open-ended dialogue (Wang et al, 2022b;Honovich et al, 2022;Ye et al, 2022;, and (b) offering human feedback signals on model responses (Ouyang et al, 2022;Glaese et al, 2022;Bai et al, 2022a;Bai et al, 2022b). We view most of these new directions as likely additive to a foundation of instruction tuning methods.…”
Section: Public Instruction Tuning Collectionsmentioning
confidence: 99%
“…Moreover, reinforcement learning from human feedback (RLHF) is then applied to better elicit LLM's internal knowledge and align with humans' values [96,137]. Based on RLHF, Bai et al [7] designed a RL from AI feedback diagram to get a more harmless (and still helpful) language model. Inference Phase.…”
Section: Towards E2e Conversational Modelmentioning
confidence: 99%
“…As large-scale pre-trained LMs become integrated in more systems, it is a matter of utmost societal importance to make sure that such models adhere to shared human values (Bai et al, 2022;Liu et al, 2021dLiu et al, , 2022. Here, we present a light-weight framework that can align the generation of LMs with such values, without requiring new data or extensive prompt-engineering.…”
Section: Ethics Broader Impact and Reproducibilitymentioning
confidence: 99%