Second Thoughts are Best: Learning to Re-Align With Human Values from Text Edits

Liu, Ruibo; Jia, Chenyan; Zhang, Ge; Zhuang, Ziyu; Liu, Tony X; Vosoughi, Soroush

doi:10.48550/arxiv.2301.00355

Cited by 3 publications

(7 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The inclusion of colourblind or otherwise impaired road users in traffic indeed represents an important human value, and perhaps also one that a human researcher would have wanted to consider; however, strictly speaking, it was not something that was prominently featured in the 30 summaries and therefore did not emerge in the human content analysis. The challenge of alignment, that is, to what extent the output that large language models generate aligns with human values, is an ongoing research topic [ 80 , 81 ]. It can also be noted here that these values may be context-dependent, whereby in some cases, a researcher might want to receive a mechanistic output, such for a meta-analysis, and in other cases, they would prefer an output that takes into account more general human values such as safety and inclusiveness.…”

Section: Discussionmentioning

confidence: 99%

Using ChatGPT for human–computer interaction research: a primer

Tabone,

de Winter

2023

R. Soc. open sci.

View full text Add to dashboard Cite

ChatGPT could serve as a tool for text analysis within the field of Human–Computer Interaction, though its validity requires investigation. This study applied ChatGPT to: (1) textbox questionnaire responses on nine augmented-reality interfaces, (2) interview data from participants who experienced these interfaces in a virtual simulator, and (3) transcribed think-aloud data of participants who viewed a real painting and its replica. Using a hierarchical approach, ChatGPT produced scores or summaries of text batches, which were then aggregated. Results showed that (1) ChatGPT generated sentiment scores of the interfaces that correlated extremely strongly ( r > 0.99) with human rating scale outcomes and with a rule-based sentiment analysis method (criterion validity). Additionally, (2) by inputting automatically transcribed interviews to ChatGPT, it provided meaningful meta-summaries of the qualities of the interfaces (face validity). One meta-summary analysed in depth was found to have substantial but imperfect overlap with a content analysis conducted by an independent researcher (criterion validity). Finally, (3) ChatGPT's summary of the think-aloud data highlighted subtle differences between the real painting and the replica (face validity), a distinction corresponding with a keyword analysis (criterion validity). In conclusion, our research indicates that, with appropriate precautions, ChatGPT can be used as a valid tool for analysing text data.

show abstract

Section: Discussionmentioning

confidence: 99%

Using ChatGPT for human–computer interaction research: a primer

Tabone,

de Winter

2023

R. Soc. open sci.

View full text Add to dashboard Cite

show abstract

“…Some researchers found that only a small proportion of the whole response (e.g., one or two words) needs to be fixed. Thus, An edition module takes effect after the generation to fix some problems in some works [75,78]. Similarly, text style transfer or rephrasing from toxicity to non-toxicity can also be plugged in this stage [26,66].…”

Section: Towards Pipeline-based Systemmentioning

confidence: 99%

Recent Advances towards Safe, Responsible, and Moral Dialogue Systems: A Survey

Deng¹,

Sun²,

Zhang³

et al. 2023

Preprint

View full text Add to dashboard Cite

With the development of artificial intelligence, dialogue systems have been endowed with amazing chitchat capabilities, and there is widespread interest and discussion about whether the generated contents are socially beneficial. In this paper, we present a new perspective of research scope towards building a safe, responsible, and modal dialogue system, including 1) abusive and toxic contents, 2) unfairness and discrimination, 3) ethics and morality issues, and 4) risk of misleading and privacy information. Besides, we review the mainstream methods for evaluating the safety of large models from the perspectives of exposure and detection of safety issues. The recent advances in methodologies for the safety improvement of both end-to-end dialogue systems and pipeline-based models are further introduced. Finally, we discussed six existing challenges towards responsible AI: explainable safety monitoring, continuous learning of safety issues, robustness against malicious attacks, multimodal information processing, unified research framework, and multidisciplinary theory integration. We hope this survey will inspire further research toward safer dialogue systems. 1

show abstract

“…The first group generally seeks value alignment, i.e., some notion of steering language models towards producing societally-desirable text Liu et al, 2021a). We note a variety of vague goals such as to reduce "non-normative" (Peng et al, 2020) or "immoral" text (Liu et al, 2023c); to generate more "pro-social" or "legitimate" text (Bakker et al, 2022); or to encourage that LLM technologies have a "positive impact on society" (Liu et al, 2023b). Specific motivations include minimising toxic or offensive language (Dinan et al, 2019;Xu et al, 2021a;Ju et al, 2022;Scheurer et al, 2022;Korbak et al, 2023); improving safety (Liu et al, 2021a;Thoppilan et al, 2022;Ganguli et al, 2022;Jin et al, 2022); adapting to ethical or moral scenarios (Forbes et al, 2020;Jin et al, 2022); or achieving political ideological balance (Liu et al, 2021b).…”

Section: Conceptual Classificationmentioning

confidence: 99%

“…Explicit comparisons collected on model outputs are used to reveal the preferences of human raters (Gao et al, 2018;Ziegler et al, 2019;Askell et al, 2021;Jaques et al, 2020;Stiennon et al, 2020;Ganguli et al, 2022;Glaese et al, 2022). 6 More finegrained feedback includes binary or Likert scale questions on text attributes (Nakano et al, 2021;Menick et al, 2022;Thoppilan et al, 2022); natural language comments (Ju et al, 2022;Scheurer et al, 2022); or edits (Hancock et al, 2019;Liu et al, 2023c). Ideal demonstrations are used to ground norm-dependent or ethical judgements (Forbes et al, 2020;Pyatkin et al, 2022;Jin et al, 2022), or in combination with ratings to prime model behaviour (Nakano et al, 2021;Wu et al, 2021;Ouyang et al, 2022;Bakker et al, 2022).…”

Section: Collecting Feedbackmentioning

confidence: 99%

“…(Step 4): Fine-tune a RL policy (another LM) that generates text autoregressively, whilst the PM provides a reward signal. Often, the policy is updated using the PPO algorithm (Ziegler et al, 2019;Stiennon et al, 2020;Nakano et al, 2021) and a KL-penalty term is applied to control deviations from the base model (Jaques et al, 2019;Ziegler et al, 2019;Stiennon et al, 2020;Nakano et al, 2021;Menick et al, 2022;Ouyang et al, 2022;Liu et al, 2023c). This pipeline can be implemented in offline, online or batched settings (see Ziegler et al, 2019).…”

Section: Integrating Feedbackmentioning

confidence: 99%

See 1 more Smart Citation

The Past, Present and Better Future of Feedback Learning in Large Language Models for Subjective Human Preferences and Values

Kirk,

Bean,

Vidgen

et al. 2023

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

Human feedback is increasingly used to steer the behaviours of Large Language Models (LLMs). However, it is unclear how to collect and incorporate feedback in a way that is efficient, effective and unbiased, especially for highly subjective human preferences and values. In this paper, we survey existing approaches for learning from human feedback, drawing on 95 papers primarily from the ACL and arXiv repositories. First, we summarise the past, pre-LLM trends for integrating human feedback into language models. Second, we give an overview of present techniques and practices, as well as the motivations for using feedback; conceptual frameworks for defining values and preferences; and how feedback is collected and from whom. Finally, we encourage a better future of feedback learning in LLMs by raising five unresolved conceptual and practical challenges.

show abstract

Second Thoughts are Best: Learning to Re-Align With Human Values from Text Edits

Cited by 3 publications

References 24 publications

Using ChatGPT for human–computer interaction research: a primer

Using ChatGPT for human–computer interaction research: a primer

Recent Advances towards Safe, Responsible, and Moral Dialogue Systems: A Survey

The Past, Present and Better Future of Feedback Learning in Large Language Models for Subjective Human Preferences and Values

Contact Info

Product

Resources

About