One challenge for dialogue agents is recognizing feelings in the conversation partner and replying accordingly, a key communicative skill. While it is straightforward for humans to recognize and acknowledge others' feelings in a conversation, this is a significant challenge for AI systems due to the paucity of suitable publicly-available datasets for training and evaluation. This work proposes a new benchmark for empathetic dialogue generation and EMPATHETICDIALOGUES, a novel dataset of 25k conversations grounded in emotional situations. Our experiments indicate that dialogue models that use our dataset are perceived to be more empathetic by human evaluators, compared to models merely trained on large-scale Internet conversation data. We also present empirical comparisons of dialogue model adaptations for empathetic responding, leveraging existing models or datasets without requiring lengthy retraining of the full model.
The inappropriate expansion and activation of autoreactive memory B cells and plasmablasts contributes to loss of self-tolerance in systemic lupus erythematosus (SLE). Defects in the inhibitory Fc receptor, FcγRIIB, have been shown to contribute to B cell activation and autoimmunity in several mouse models of SLE. In this paper, we demonstrate that expression of FcγRIIB is routinely up-regulated on memory B cells in the peripheral blood of healthy controls, whereas up-regulation of FcγRIIB is considerably decreased in memory B cells of SLE patients. This directly correlates with decreased FcγRIIB-mediated suppression of B cell receptor–induced calcium (Ca2+) response in those B cells. We also found substantial overrepresentation of African-American patients among those who failed to up-regulate FcγRIIB. These results suggest that the inhibitory receptor, FcγRIIB, may be impaired at a critical checkpoint in SLE in the regulation of memory B cells; thus, FcγRIIB represents a novel target for therapeutic interventions in this disease.
Generative dialogue models currently suffer from a number of problems which standard maximum likelihood training does not address. They tend to produce generations that (i) rely too much on copying from the context, (ii) contain repetitions within utterances, (iii) overuse frequent words, and (iv) at a deeper level, contain logical flaws. In this work we show how all of these problems can be addressed by extending the recently introduced unlikelihood loss (Welleck et al., 2019a) to these cases. We show that appropriate loss functions which regularize generated outputs to match human distributions are effective for the first three issues. For the last important general issue, we show applying unlikelihood to collected data of what a model should not do is effective for improving logical consistency, potentially paving the way to generative models with greater reasoning ability. We demonstrate the efficacy of our approach across several dialogue tasks.
Warning: this paper contains example data that may be offensive or upsetting.Conversational agents trained on large unlabeled corpora of human interactions will learn patterns and mimic behaviors therein, which may include offensive or otherwise toxic behavior. We introduce a new human-and-model-in-the-loop framework for evaluating the toxicity of such models, and compare a variety of existing methods in both the cases of non-adversarial and adversarial users that expose their weaknesses. We then go on to propose two novel methods for safe conversational agents, by either training on data from our new human-and-model-in-theloop framework in a two-stage system, or "baking-in" safety to the generative model itself. We find our new techniques are (i) safer than existing models; while (ii) maintaining usability metrics such as engagingness relative to state-of-the-art chatbots. In contrast, we expose serious safety issues in existing standard systems like GPT2 (Radford et al., 2019), DialoGPT (Zhang et al., 2019) and BlenderBot (Roller et al., 2020). * Equal contribution De Angeli and Carpenter (2005); De Angeli and Brahnam (2008) suggest that one in ten humanbot conversations may contain instances of the human demonstrating unprovoked abusive behavior towards the chatbot. Miller et al. (2017b) argued that adversarial attacks need to be expected and planned for when deploying a user-facing system that learns from its interactions. These findings suggest it is insufficient to merely exclude toxic data from training, as the model would not know how to answer hostile out-of-domain inputs, and positive biases where models tend to agree rather than contradict would lead to undesirable outcomes. As shown in Gehman et al. (2020), training on sanitized data can decrease the amount of unprompted toxic content, yet still leave models vulnerable to generating toxic content based on specific prompts. Bot-Adversarial Dialogue (this work)Build-It Break-It Fix-It for Safety (Dinan et al., 2019a)
While dialogue remains an important end-goal of natural language research, the difficulty of evaluation is an oft-quoted reason why it remains troublesome to make real progress towards its solution. Evaluation difficulties are actually two-fold: not only do automatic metrics not correlate well with human judgments, but also human judgments themselves are in fact difficult to measure. The two most used human judgment tests, single-turn pairwise evaluation and multi-turn Likert scores, both have serious flaws as we discuss in this work. We instead provide a novel procedure involving comparing two full dialogues, where a human judge is asked to pay attention to only one speaker within each, and make a pairwise judgment. The questions themselves are optimized to maximize the robustness of judgments across different annotators, resulting in better tests. We also show how these tests work in self-play model chat setups, resulting in faster, cheaper tests. We hope these tests become the de facto standard, and will release open-source code to that end.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.