Quantifying and alleviating political bias in language models

Liu, Ruibo; Jia, Chenyan; Wei, Jason; Xu, Guangxuan; Vosoughi, Soroush

doi:10.1016/j.artint.2021.103654

Cited by 28 publications

(20 citation statements)

References 62 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As large-scale pre-trained LMs become integrated in more systems, it is a matter of utmost societal importance to make sure that such models adhere to shared human values (Bai et al, 2022;Liu et al, 2021dLiu et al, , 2022. Here, we present a light-weight framework that can align the generation of LMs with such values, without requiring new data or extensive prompt-engineering.…”

Section: Ethics Broader Impact and Reproducibilitymentioning

confidence: 99%

Second Thoughts are Best: Learning to Re-Align With Human Values from Text Edits

Liu¹,

Jia²,

Zhang³

et al. 2023

Preprint

View full text Add to dashboard Cite

We present SECOND THOUGHTS, a new learning paradigm that enables language models (LMs) to re-align with human values. By modeling the chain-of-edits between value-unaligned and value-aligned text, with LM fine-tuning and additional refinement through reinforcement learning, SECOND THOUGHTS not only achieves superior performance in three value alignment benchmark datasets but also shows strong human-value transfer learning ability in few-shot scenarios. The generated editing steps also offer better interpretability and ease for interactive error correction. Extensive human evaluations further confirm its effectiveness. * Work done during the internship at Dartmouth College. 2 We take the log-probability predicted by the LM, log Pr(y|x), which is the conditional log-probability of generating option y given input context x. We then compute its exponential for better readability. Such a protocol is also adopted by BIG-Bench: https://github.com/google/BIG-bench.36th Conference on Neural Information Processing Systems (NeurIPS 2022).the source to produce the target (Figure 2 (b)). This way the model learns how to recover from a value-unaligned, poisoned context during the generation phase. Augmented Edits ModelingDP-based Edits Inference. Given two text strings, source and target, one can find unlimited ways to edit source to produce target. Thus, we apply two constraints onto the editing: (1) the edits should be combinations of generic editing operations-inserting, deleting, and replacing a single token; (2) each edit operation has a cost and our goal is to infer the chain-of-edits that has minimum cost. Under these constraints, the edits inference problem can be converted to a token-level "edit distance problem" (Jurafsky, 2000), which can be solved by dynamic programming (DP). We modify the algorithm to be able to receive customized editing costs (e.g., insert-1, delete-1, replace-2), to try to model different preferences on editing. We use special tokens to mark the start/end of editing and the new content to be inserted/replaced, and develop a decipher module that can translate the edit operations produced by DP into natural language (see §A.1 for a visualization of the whole process, and §A.3 for more discussion on edit based models).Augmented Edits Modeling (AEM). To augment the edits, we run the DP algorithm on the same source and target pairs with a variety of editing costs 4 to create a collection of chain-of-edits for each source-target pair, which we call positive demonstrations (y + ). We then fine-tune an LM on these source-edits-target text inputs (recall that the edits are turned into natural language). We call this Augmented Edits Modeling (AEM). Different from common language modeling, AEM includes the labor-free decomposition (i.e., the editing steps) into the training object, whereas prior works either train on costly manually-created decomposition (Ouyang et al., 2022; or, rather than training, prompt with such decomposition Nye et al., 2021). We also construct negative demonstrations (y − ) by us...

show abstract

Section: Ethics Broader Impact and Reproducibilitymentioning

confidence: 99%

Second Thoughts are Best: Learning to Re-Align With Human Values from Text Edits

Liu¹,

Jia²,

Zhang³

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…LaMer can generate text in a specified target style given source style input, which can serve as a useful tool for social good. For example, LaMer can assist language formalization (formality transfer), enable data anonymization (writing style transfer), or depolarize politically biased text (Liu et al, 2022;2021a), via transferring the identifiable style of the input text.…”

Section: Ethics and Reproducibility Statementmentioning

confidence: 99%

Non-Parallel Text Style Transfer with Self-Parallel Supervision

Liu¹,

Gao²,

Jia³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

The performance of existing text style transfer models is severely limited by the non-parallel datasets on which the models are trained. In non-parallel datasets, no direct mapping exists between sentences of the source and target style; the style transfer models thus only receive weak supervision of the target sentences during training, which often leads the model to discard too much style-independent information, or utterly fail to transfer the style. In this work, we propose LaMer, a novel text style transfer framework based on large-scale language models. LaMer first mines the roughly parallel expressions in the non-parallel datasets with scene graphs, and then employs MLE training, followed by imitation learning refinement, to leverage the intrinsic parallelism within the data. On two benchmark tasks (sentiment & formality transfer) and a newly proposed challenging task (political stance transfer), our model achieves qualitative advances in transfer accuracy, content preservation, and fluency. Further empirical and human evaluations demonstrate that our model not only makes training more efficient, but also generates more readable and diverse expressions than previous models.

show abstract

“…The goal of KID is to provide a general-purpose knowledge infusing decoding algorithm by leveraging retrieved Wikipedia documents. Still, the generation of KID can be affected by certain biases from the LM it is based on though those biases may be partially mitigated by the knowledge demonstrations (Liu et al, 2022;Rae et al, 2021). Another major ethical consideration is that KID can mimic undesirable attributes of the target knowledge source documents that could be noncontemporary and do not represent current norms and practices-and KID has no scheme to diagnose these problems (Lewis et al, 2020b).…”

Section: Ethics and Reproducibility Statementmentioning

confidence: 99%

Knowledge Infused Decoding

Liu¹,

Zheng²,

Gupta³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Pre-trained language models (LMs) have been shown to memorize a substantial amount of knowledge from the pre-training corpora; however, they are still limited in recalling factually correct knowledge given a certain context. Hence, they tend to suffer from counterfactual or hallucinatory generation when used in knowledge-intensive natural language generation (NLG) tasks. Recent remedies to this problem focus on modifying either the pre-training or task fine-tuning objectives to incorporate knowledge, which normally require additional costly training or architecture modification of LMs for practical applications. We present Knowledge Infused Decoding (KID)-a novel decoding algorithm for generative LMs, which dynamically infuses external knowledge into each step of the LM decoding. Specifically, we maintain a local knowledge memory based on the current context, interacting with a dynamically created external knowledge trie, and continuously update the local memory as a knowledge-aware constraint to guide decoding via reinforcement learning. On six diverse knowledgeintensive NLG tasks, task-agnostic LMs (e.g., GPT-2 and BART) armed with KID outperform many task-optimized state-of-the-art models, and show particularly strong performance in few-shot scenarios over seven related knowledge-infusion techniques. Human evaluation confirms KID's ability to generate more relevant and factual language for the input context when compared with multiple baselines. Finally, KID also alleviates exposure bias and provides stable generation quality when generating longer sequences. Code for KID is available at https:// github.com/microsoft/KID.

show abstract

Quantifying and alleviating political bias in language models

Cited by 28 publications

References 62 publications

Second Thoughts are Best: Learning to Re-Align With Human Values from Text Edits

Second Thoughts are Best: Learning to Re-Align With Human Values from Text Edits

Non-Parallel Text Style Transfer with Self-Parallel Supervision

Knowledge Infused Decoding

Contact Info

Product

Resources

About